Open djtfmartin opened 1 year ago
is anyone looking into this one ?
Thanks @djtfmartin for flagging these. I'll check with the biocollect team on the duplicated drs that's created. The data resources are currently created automatically by biocollect when the project is setup.
The preingestion job currently only harvest those dataresources from this api https://ecodata.ala.org.au/ws//record/listHarvestDataResource?max=200&offset=0&sort=asc
@temi
Looks like licenses are provided at record level. @temi these duplicates from biocollect are still appearing in the collectory.... is this related to https://github.com/AtlasOfLivingAustralia/biocollect/issues/1509 ?
Thanks for letting me know @djtfmartin @peggynewman. I thought this was fixed. But unfortunately there was a silent error which is causing this issue. I have a fix for it and will deploy it soon.
@peggynewman @pbrenton @patkyn FYI, I have updated the data resource id on 127 projects. This will cause a big release of records from BioCollect on the next ingest. There are around 162,460 at the moment.
The airflow preingestion job that adds BioCollect datasets is not setting the licence in the collectory. The last BioCollect harvested added 211 new datasets, all of which dont have licences. Example recently added: https://collections.ala.org.au/public/show/dr22260
In addition duplicates are being added:
See recent work on #934 to clean up these.
cc @peggynewman