Closed sbarbosadataverse closed 1 month ago
I don't recall the particular details of why this text was added, it does look like it did also add extra metadata to the dataset metadata.
Regardless we could search for drafts which have files in their metadata with the text:
[metadata has been automatically re-extracted from this file after Dataverse upgrade to v.4.0]
and then delete those. Or at least inventory them.
@sbarbosadataverse @scolapasta
There are very few drafts in CFA affected by this, so I was able to review them individually.
As I said earlier, I didn't have much recollection of how these drafts were produced, but I was able to recall and/or reconstruct everything and I can now explain exactly what was done:
This was all done on purpose, in coordination with CFA. Specifically, Gus Muench worked with us on improving support for extracting metadata from their FITS files. It was decided to re-run the extraction on all such files in CFA. Drafts were created for all the datasets that were re-processed, with the idea that individual authors would decide whether to keep or delete them.
There are only 4 such drafts still remaining. The one you posted above:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/23099
Plus 3 more:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/10.1088
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/26818
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/29021
If you want, I can easily delete them - just let me know, or you can delete them yourself. But please note that some Astrophysics metadata values were actually added for the 4 datasets above (that's how our FITS file processing worked - metadata values were extracted from individual files, and aggregated on the dataset level). Please note that many, or most authors appear to have decided to keep these extra metadata in the datasets and published them since. For example, these extra values were extracted and added in doi:10.7910/DVN/26818:
So, I'm guessing there's a chance it may be useful for this author as well (?). But, up to you. They haven't touched these datasets in almost 10 years.
*) The only actual problem that I saw was that there were 4 files total in the dataset you found, where the generated label ("metadata has been automatically re-extracted from this file ...") was added to the description twice, for some reason. I fixed that.
I get it now. So, actual 'Value" was added to the dataset @landreev. In that case, I will publish them all and notify the authors.
Thanks
I'll close the issue then, if that's ok?
OK, closing.
At CfA, a depositor has a draft that was created during the migration from (4.0 likely) and never published, for obvious reasons. Look under the versions tab: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/23099
In speaking with @landreev Dataverse v4); so it could have been something that had to be done when we migrated from DVN3. The draft was last updated in June 2015 - right after the migration.
Although, this part in the file descriptions: [metadata has been automatically re-extracted from this file after Dataverse upgrade to v.4.0] - maybe that’s all it is, all that extra metadata is what Dataverse 4 automatically extracted from their FITS files after we migrated to Dataverse 4?
@scolapasta If anyone else within the team can possibly remember anything about this, that would be Gustavo.
The Draft was not created by the author and this issue is likely to impact any dataset that was migrated for this purpose (and other purposes?)