Closed giorgiobasile closed 7 years ago
I see what you mean. This is not the only occasion I've had trouble with the resync library and I've spend many an hour refactoring and trying to let the thing do what I want. I think best solution is to skip the dependency on resync and simply have a simple xml module capable of reading and writing sitemap xml to and from a Python class structure. Just that. Nothing more... Volunteers?
Changed the implementation of save_sitemap in Executors. Tested on Mac and Windows. Behavior is now as expected i.e. indexes are written as <sitemapindex>
. Encoding is still utf-8
.
Now the documents are not advertising their capability.
Fixed advertising capability in read and write methods: https://github.com/EHRI/rspub-core/blob/master/rspub/core/executors.py#L388
When writing an index (e.g. resourcelist-index.xml), this is saved as a
<urlset>
instead of<sitemapindex>
. I verified it using both your CLI and my elasticsearch module. At the moment, I just overrode theExecutor.save_sitemap()
method (see here) in order to avoid to go through theListBaseWithIndex.write()
method which seems to recognise the index as a urlset. I tried to go deep into the resync library but I think the problem is that settingsitemap.sitemapindex = True
is not the criteria (at least not the only one) which leads the document to be saved as either urlset or sitemapindex.