EHRI / rspub-core

Core Python library for ResourceSync publishing
Apache License 2.0
1 stars 3 forks source link

Indexes are generated as urlset #4

Closed giorgiobasile closed 7 years ago

giorgiobasile commented 7 years ago

When writing an index (e.g. resourcelist-index.xml), this is saved as a <urlset> instead of <sitemapindex>. I verified it using both your CLI and my elasticsearch module. At the moment, I just overrode the Executor.save_sitemap() method (see here) in order to avoid to go through the ListBaseWithIndex.write() method which seems to recognise the index as a urlset. I tried to go deep into the resync library but I think the problem is that setting sitemap.sitemapindex = True is not the criteria (at least not the only one) which leads the document to be saved as either urlset or sitemapindex.

dans-er commented 7 years ago

I see what you mean. This is not the only occasion I've had trouble with the resync library and I've spend many an hour refactoring and trying to let the thing do what I want. I think best solution is to skip the dependency on resync and simply have a simple xml module capable of reading and writing sitemap xml to and from a Python class structure. Just that. Nothing more... Volunteers?

dans-er commented 7 years ago

Changed the implementation of save_sitemap in Executors. Tested on Mac and Windows. Behavior is now as expected i.e. indexes are written as <sitemapindex>. Encoding is still utf-8.

dans-er commented 7 years ago

Now the documents are not advertising their capability.

dans-er commented 7 years ago

Fixed advertising capability in read and write methods: https://github.com/EHRI/rspub-core/blob/master/rspub/core/executors.py#L388