IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34
4 stars 1 forks source link

Review the remaining CFA Drafts auto-generated during the Dataverse 4.0 upgrade #262

Closed sbarbosadataverse closed 1 month ago

sbarbosadataverse commented 5 months ago

At CfA, a depositor has a draft that was created during the migration from (4.0 likely) and never published, for obvious reasons. Look under the versions tab: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/23099

In speaking with @landreev Dataverse v4); so it could have been something that had to be done when we migrated from DVN3. The draft was last updated in June 2015 - right after the migration.

Although, this part in the file descriptions: [metadata has been automatically re-extracted from this file after Dataverse upgrade to v.4.0] - maybe that’s all it is, all that extra metadata is what Dataverse 4 automatically extracted from their FITS files after we migrated to Dataverse 4?

@scolapasta If anyone else within the team can possibly remember anything about this, that would be Gustavo.

The Draft was not created by the author and this issue is likely to impact any dataset that was migrated for this purpose (and other purposes?)

Screen Shot 2024-04-10 at 4 38 23 PM
scolapasta commented 5 months ago

I don't recall the particular details of why this text was added, it does look like it did also add extra metadata to the dataset metadata.

Regardless we could search for drafts which have files in their metadata with the text: [metadata has been automatically re-extracted from this file after Dataverse upgrade to v.4.0]

and then delete those. Or at least inventory them.

landreev commented 2 months ago

@sbarbosadataverse @scolapasta
There are very few drafts in CFA affected by this, so I was able to review them individually. As I said earlier, I didn't have much recollection of how these drafts were produced, but I was able to recall and/or reconstruct everything and I can now explain exactly what was done: This was all done on purpose, in coordination with CFA. Specifically, Gus Muench worked with us on improving support for extracting metadata from their FITS files. It was decided to re-run the extraction on all such files in CFA. Drafts were created for all the datasets that were re-processed, with the idea that individual authors would decide whether to keep or delete them. There are only 4 such drafts still remaining. The one you posted above: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/23099 Plus 3 more: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/10.1088 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/26818 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/29021

If you want, I can easily delete them - just let me know, or you can delete them yourself. But please note that some Astrophysics metadata values were actually added for the 4 datasets above (that's how our FITS file processing worked - metadata values were extracted from individual files, and aggregated on the dataset level). Please note that many, or most authors appear to have decided to keep these extra metadata in the datasets and published them since. For example, these extra values were extracted and added in doi:10.7910/DVN/26818:

Screen Shot 2024-07-22 at 9 12 48 AM So, I'm guessing there's a chance it may be useful for this author as well (?). But, up to you. They haven't touched these datasets in almost 10 years.

*) The only actual problem that I saw was that there were 4 files total in the dataset you found, where the generated label ("metadata has been automatically re-extracted from this file ...") was added to the description twice, for some reason. I fixed that.

sbarbosadataverse commented 2 months ago

I get it now. So, actual 'Value" was added to the dataset @landreev. In that case, I will publish them all and notify the authors.

Thanks

landreev commented 2 months ago

I'll close the issue then, if that's ok?

landreev commented 1 month ago

OK, closing.