BelgianBiodiversityPlatform / python-dwca-reader

🐍 A Python package to read Darwin Core Archive (DwC-A) files.
BSD 3-Clause "New" or "Revised" License
43 stars 21 forks source link

Headers consistency checks #79

Open niconoe opened 4 years ago

niconoe commented 4 years ago

André informed me of some archives (found in the wild) where there's an inconsistency between the CSV headers and the field list from the metafile.

Should we try to detect those and report the inconsistency?

tucotuco commented 4 years ago

Yes. :-)

On Thu, Sep 19, 2019 at 11:15 AM Nicolas Noé notifications@github.com wrote:

André informed me of some archives (found in the wild) where there's an inconsistency between the CSV headers and the field list from the metafile.

Should we try to detect those and report the inconsistency?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/79?email_source=notifications&email_token=AADQ7227N2S72BLR4J4Z3SDQKOCOTA5CNFSM4IYLXEEKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMNXNNQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ72ZXFPQRPFCRKAOHHT3QKOCOTANCNFSM4IYLXEEA .

niconoe commented 4 years ago

@andrejjh found another one...

@tucotuco, any opinion of how we should handle this? I'm thinking of just throwing an exception at the user's face, but if it's a common practice I might have complaints that python-dwca-reader is too strict. I can also add an option to disable the consistency check.

andrejjh commented 4 years ago

Maybe with an option eg -check_headers

tucotuco commented 4 years ago

What is the expected behaviour? If the metafile references a field that isn't in the data file at the position it says it should be, that to me should be an exception. If the data file has extra fields not mentioned by the meta file, that to me would be fine.

On Fri, Sep 20, 2019 at 10:31 AM André Heughebaert notifications@github.com wrote:

Maybe with an option eg -check_headers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BelgianBiodiversityPlatform/python-dwca-reader/issues/79?email_source=notifications&email_token=AADQ727HAWAPT3A4UF7IFWLQKTGCLA5CNFSM4IYLXEEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GWNGI#issuecomment-533554841, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ72YMKHVCPNLGBCDE4QLQKTGCLANCNFSM4IYLXEEA .