BelgianBiodiversityPlatform / python-dwca-reader

🐍 A Python package to read Darwin Core Archive (DwC-A) files.
BSD 3-Clause "New" or "Revised" License
43 stars 21 forks source link

Archives lacking a metadata attribute in the archive element fail to be read #51

Closed nickynicolson closed 8 years ago

nickynicolson commented 8 years ago

In the DC spec the metadata attribute in the archive element is not listed as required, but reading an archive without a metadata element fails. [Note: is this DC spec current? If not please add a comment with a pointer to the current spec and I'll update the issue as required.]

Example archive without metadata element in the archive attribute (in meta.xml): http://zingiberaceae.e-monocot.org/dwca.zip

Error is thrown in ArchiveDescriptor in the file dwca\descriptors.py. As this method is reading from meta.xml (itself hardcoded as a deafult) it seems reasonable to try to use a default of meta.xml if the metadata attribute is missing.

niconoe commented 8 years ago

Thanks @nickynicolson !

I'm currently in the process of adding support from other kind of archives, and this is really helpful since it's closely related.

The case you suggest is a bit different: we do have a metafile, but it doesn't reference any metadata file. I think the best solution in that case is to just gracefully set dwca.metadatato None.

Another option would be to also use a default of EML.xml, but according to the DC spec you mentioned, there's no default value for this attribute.

Do you agree with this course of action? I think by all this, we'll support a whole new range of archives.

BTW, I often have doubts about the standards when discussing such topics, so I've opened a wiki page at https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/wiki/Things-to-clarify-about-the-standards to collect such questions, that may be useful to report later. Don't hesitate if you want to contribute :)

nickynicolson commented 8 years ago

Thanks @niconoe - and the wiki page is a good idea, I'll use that.