Open eliagbayani opened 3 months ago
Hi Jen @jhammock, Attached is a list of private datasets from our five organizations in opendata.eol.org. I will exclude these datasets in migration to Zenodo. Unless you pick and want me to include some from the list. Thanks. private_datasets.txt
Ah yes, we'll need to decide what to do about those. I expect the files used in resource connectors should go into the new docker container. I'll review the "old resources"; possibly those can go into Zenodo as well, but I'll check them individually.
@jhammock @KatjaSchulz All broken URLs in opendata.eol.org are now once more accessible. That is, those URLs written in this long format (previously broken) are now accessible:
In the actual OpenData resource record, the URL is now transformed in this format (shorter):
Nonetheless, both URL formats are accessible. So we won't get any of these type of alerts anymore.
I needed this done before I migrate anything to Zenodo. Admittedly, fixing the broken long URLs was an accident when I made the shorter URLs work :-) Thanks.
Thanks for the update, Eli! I'll appreciate not having that to worry about until we're migrated :)
Wonderful! Thanks Eli.
On Sun, Aug 4, 2024 at 7:49 PM Jen Hammock @.***> wrote:
Thanks for the update, Eli! I'll appreciate not having that to worry about until we're migrated :)
— Reply to this email directly, view it on GitHub https://github.com/EOL/ContentImport/issues/16#issuecomment-2267947901, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSQBNDIT4XAMXRJXLQ5ACTZP24YPAVCNFSM6AAAAABLVVVQTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXHE2DOOJQGE . You are receiving this because you were mentioned.Message ID: @.***>
Hi Jen, @jhammock
Thanks.
@jhammock @KatjaSchulz
Tip: If you know the complete title of your record in Zenodo.
And you try to search it. Paste this in the search textbox:
title:("Your Complete Title")
You can also search by Subject:
subject("EOL Content Partners: Water Body Checklists")
More search tips here.
@jhammock @KatjaSchulz Update: Generated an HTML page that will initially assist us in navigating the individual specific (public) resources in Zenodo. This HTML page was organized using our OpenData's original sections: organizations -> datasets -> resources. Zenodo doesn't have these type of sections. opendata_zenodo.html.zip Please unzip to get the HTML page. Thanks.
@jhammock @KatjaSchulz All public datasets are now in Zenodo. I have not yet moved the private datasets from opendata.eol.org to Zenodo. Do we need to do that? If we do move them, they will take the 'restricted' option in Zenodo. Restricted means, the record is publicly accessible, but files are restricted only to users with access. Thanks.
I think that status aught to suit most if not all such cases. @KatjaSchulz , we should both check, I suppose. If there's something we don't want to even announce that we have, we can move it offline for now.
1st private record (restricted) e.g. WoRMS internal: World Register of Marine Species 'Restricted' status works as intended. If you're not logged then you will not be able to download the file. Will continue with the others.
Status: From: Aug 28
No concerns about the test dataset. It may not be the one currently in use, and we can always make up another.
@eliagbayani I'm trying to orient myself to the zenodo interface. Can you explain this to me?
https://zenodo.org/records/13253933/files/13253933.dat?download=1
It's listed under "Files" at https://zenodo.org/records/13253933
@jhammock The .dat file was a temporary file I used if the main file is not available during the migration. In this case the main file is: https://eol.org/data/full_provider_ids.csv.gz I assume during the time of migration this file was inaccessible after a number of tries thus it falls back to using the .dat file in order the record to be published.
So the plan is for the intended files to replace the temp file ultimately, wherever it appears? Is manual editing needed?
Yes, this one needs manual editing. step 1: click [New Version] step 2: upload the desired file, click button [Upload files] step 3: enter the Publication date step 4: finally click on [Publish] button.
@jhammock, here is the New Version you initiated but was not completed. https://zenodo.org/uploads/13741713 Just in case you are looking for it.
I can't remember starting that process so I discarded it. Just checking:
@jhammock case 1 - Yes, eventually Zenodo can host the files. Yes, your uploading it now, will not interfere with my ability to update it automatically later. If case 1 is met, we don't need a .dat file anymore.
case 2 - Or we provide just the URL e.g. https://eol.org/data/full_provider_ids.csv.gz as metadata in Zenodo record. If case 2 is met, we need to have a .dat file or any file (I chose .dat) uploaded to publish the Zenodo record.
OK, I can see advantages to both cases, but if zenodo policy permits, I think I crave the redundancy of them hosting a copy of all files we list there. We'd presumably also have one of everything, eventually in your new docker instance, @eliagbayani . @KatjaSchulz do you concur?
Yes I vote for redundancy as well. Thanks.
Okay, I am getting familiar with zenodo metadata edits. I gather a new version of a resource is only required when the files associated with the record are changed. I have created v2 of the identifier map. I have also messed with some of the metadata, in several subsequent edits, and learned that this can be done while preserving the same version-specific doi. Yay!
@KatjaSchulz you should definitely review this one because I named you as the creator. You may prefer to name an institution, which is an option, or to name several creators. I am implicated also for the moment, in the contributor category, as a "contact person". We should probably hash out a policy about this kind of metadata in the zenodo context; the aggregate datasets will probably be case by case, but for the resource files we should be able to do something consistent- or a few different consistent things over different kinds of resources.
Thanks Eli, this will be very useful.
On Tue, Sep 17, 2024 at 11:52 AM Eli Agbayani @.***> wrote:
@jhammock https://github.com/jhammock @KatjaSchulz https://github.com/KatjaSchulz Tip: If you know the complete title of your record in Zenodo. And you try to search it. Paste this in the search textbox: title:("Your Complete Title")
You can also search by Subject: subject("EOL Content Partners: Water Body Checklists")
More search tips here. https://help.zenodo.org/guides/search/
— Reply to this email directly, view it on GitHub https://github.com/EOL/ContentImport/issues/16#issuecomment-2284621628, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSQBNEJP52BLCOOXPTSOITZXBF2XAVCNFSM6AAAAABLVVVQTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBUGYZDCNRSHA . You are receiving this because you were mentioned.Message ID: @.***>
@jhammock @KatjaSchulz Attached is a list of records where files are saved elsewhere (n=56). If I'm not mistaken, all should have a .dat file as its uploaded file. Except for one: [title] => identifier map: current version [URL] => https://eol.org/data/full_provider_ids.csv.gz [Zenodo] => https://zenodo.org/records/13253933
Where its latest version is now: EOL full taxon identifier map https://zenodo.org/records/13751009
Jen, Question, do you want me to proceed and create/run a script that will check the URLs if valid and upload the actual file to its respective Zenodo record? Of course a new version of the record will be created (Version 2) to have the uploaded file. If the URL is already broken then I don't change anything.
Or do you want these records handled manually by you and Katja? Thanks. FilesSavedElsewhere.txt
Thanks, Eli!
Give us a moment to go through this list; at a glance a couple of these may just be odd ducks to be archived, or otherwise treated differently. I expect most of them will want that script, on a regular schedule.
More soon!
Jen
On Wed, Sep 18, 2024 at 11:03 AM Eli Agbayani @.***> wrote:
@jhammock https://github.com/jhammock @KatjaSchulz https://github.com/KatjaSchulz Attached is a list of records where files are saved elsewhere (n=56). If I'm not mistaken, all should have a .dat file as its uploaded file. Except for one: [title] => identifier map: current version [URL] => https://eol.org/data/full_provider_ids.csv.gz [Zenodo] => https://zenodo.org/records/13253933 Where its latest version is now: EOL full taxon identifier map https://zenodo.org/records/13751009
Jen, Question, do you want me to proceed and create/run a script that will check the URLs if valid and upload the actual file to its respective Zenodo record? Of course a new version of the record will be created (Version 2) to have the uploaded file. If the URL is already broken then I don't change anything.
Or do you want these records handled manually by you and Katja? Thanks. FilesSavedElsewhere.txt https://github.com/user-attachments/files/17046039/FilesSavedElsewhere.txt
— Reply to this email directly, view it on GitHub https://github.com/EOL/ContentImport/issues/16#issuecomment-2358725667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXC5B2B72EZLAGRW4TGGF3ZXGI4ZAVCNFSM6AAAAABLVVVQTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJYG4ZDKNRWG4 . You are receiving this because you were mentioned.Message ID: @.***>
Hi Eli,
Jen & I just did a deep-dive on Zenodo and came up with a list of things we would like to change. Here are the things we hope you can do through the API:
Let us know if you have any questions.
Hi Jen, @jhammock These are the 7 records under the EOL computer vision pipelines
I think I set these records initially to 'Restricted'. I'm not sure if my recent bulk updates have accidentally set these to 'Public'. Or have you set these to 'Public'? If not I'll just set them back to 'Restricted'. Thanks.
@KatjaSchulz @jhammock The script finished doing the bulk updates. Zenodo Like what I mentioned before, it seems Zenodo's 'write API' is lagging behind what the interface can do. One is that the API cannot set a Creator to be of type 'Organization'. It always defaults to 'Personal'. Also the API cannot set the 'role' of the Creator. But it CAN set the 'role' of the Contributor.
Another API setback is that it cannot assign identifiers (e.g. ORCID) to Creators and Contributors.
Anyway, the rest of the requirements were met fine.
Also I removed all Contributors with my name 'Eli Agbayani'. These are just remnants of the old CKAN framework. But I set others like 'Jen Hammock' or 'Sarah Miller' as 'Contact Person'. Please tell me if we need to change this. And as proposed 'Anne Thessen' as 'Data Manager'. Thanks.
Note to Eli: to do It is the case from the start. Find a way to use the apostrophe in API commands. It causes the API to fail even when it is escaped. Right now I replaced the apostrophe with 2 underscores "__".
Seems like there should be an easy solution and the API should be able to handle it but haven't found the solution yet.
Thanks Eli, It's unfortunate that the Zenodo API has these limitations, but none of them are a huge deal. Stay tuned for some more bulk tag updates. Hopefully those will be easy.
Hi Jen, @jhammock These are the 7 records under the EOL computer vision pipelines
I think I set these records initially to 'Restricted'. I'm not sure if my recent bulk updates have accidentally set these to 'Public'. Or have you set these to 'Public'? If not I'll just set them back to 'Restricted'. Thanks.
I did set them to public, Eli, thanks for checking. Katie was inquiring about them; some colleagues of hers were interested in having a look.
Noted Jen. No worries, will leave them as 'Public' then. Thanks.
Update: Not quitting just yet. We can now use API (bulk updates) to update Creators and Contributors with their identifiers. Identifiers include ORCID and GND but not ISNI. Attached just an example:
@KatjaSchulz , yes please just send me proposed bulk updates and hopefully are doable. Thanks.
Hi Eli,
Could you please do a few more tag clean-ups?
Please add the tag "geography" to data sets that currently have one of the following tags:
Please add the tag "descriptions" to data sets that currently have tag "EOL Content Partners: Wikipedia" Also, we think it would make sense if you added yourself as the creator (or contributor whatever you prefer) with role data manager for the Wikipedia data sets.
Once you have added the new tags, please remove all of the following tags:
Thanks!
Bug report! I think. We've found a few cases of zenodo records that resisted your bulk edits, Eli. This one is an interesting example, as it seems to have resisted both a Subject tag removal and a Contributor role change. Something to do with the history of the file edits, maybe? Or might this indicate transient errors during the running of the batch edit? Anyway, there don't seem to be a ton of these, so it's not critical, but if an easy experiment occurs to you for cleaning these up, it's worth trying.
@jhammock Good catch Jen. Thanks. Found the culprit: same titles, different records. It is also the same way these records were saved in CKAN. The bulk-update script assumed that titles are unique. Thus missing 281 records. e.g.
Arctic Biodiversity: Arctic Freshwater Fishes https://zenodo.org/records/13315783 https://zenodo.org/records/13315751
Africa Tree Database https://zenodo.org/records/13312623 https://zenodo.org/records/13312619
Fairbairn, 2013 https://zenodo.org/records/13316319 https://zenodo.org/records/13316311
Ramirez, et al, 2008: Ramirez et al, 2008 https://zenodo.org/records/13310465 https://zenodo.org/records/13310461
Only the 2nd record among these pairs were processed. Anyway, all 281 records missed the last time are now processed as well.
@KatjaSchulz Will I also add the tag 'geography' if the existing tags are:
Or only add 'geography' strictly for values: without " 2019"
Thanks.
Good question! We mulled that over, but based on the zenodo search tools decided not. We're not confident of being able to filter conveniently to exclude deprecated datasets, so we don't want to give those any other tags.
I've started to mess around with tags and metadata and wanted to check something before I make a mess. Eventually, we'll need a mapping of old CKAN addresses to their corresponding zenodo addresses in order to update the resource file links in the harvesting layer. I wouldn't say automating this is super important, but if we have such a mapping already or could easily make one it will certainly be useful, and I want to make sure I'm not messing that up. I've started editing the Related Works metadata, adding two things so far:
But more urgently, @eliagbayani , I've deleted a few "is supplement to" relationships, (like this one, not yet removed) thinking we only needed them in case of the file upload difficulties we had earlier. However, if those relationships are present on all our zenodo records, and are the easiest way to trace them back to the ckan records, perhaps I should hold off. Please let me know, what you think about that ckan<->zenodo mapping and in particular if I should leave the supplement relationships alone for that or any other reason. I do want to remove them eventually to avoid confusing our zenodo visitors, but there's no great rush.
@jhammock , I'm exploring and will get back to your message. Thanks.
@jhammock
your introduction of the relationship "is source of" is a welcome addition. It shows a clear link back to eol.org. I can also check if I can do a bulk-update to add the "is source of" relationship.
regarding the ckan<->zenodo mapping. I think I already have something like it. Please check this PDF. EOL_resource_id_and_Zenodo_id_file.pdf
the "is supplement to" relationship is relevant to those records where the file (DwCA) is something that we generate and have a connector for. e.g. FishBase I recommend we leave it for these records as I use it to link to our connectors. That is, to facilitate auto-update of respective Zenodo record after connector finishes. But we can remove it for those we don't have a connector for e.g. Reid et al, 2012
Thanks.
Thanks for that quick investigation, Eli! Yes, that mapping looks like it will make the updating of our harvest layer links very easy when the time comes. So the important thing is for me not to bother the is-supplement relationships for the live connector resources. Where's the best place for me to refer to for a list of those? In the Jenkins?
If you can handily automate the is-source relationships, that would be grand; if not, no complaints. Let me know- if it is, I'll remove the ones I've entered manually, so you can make a clean job of the whole collection. That'll only need to be done once, and I'll probably end up removing a few afterwards. Not everything with a resource page in the publishing layer is published, approved, and non-redundant :)
@jhammock ,
@jhammock, Confirmed, we can add Related Works -> 'is source of' in bulk-updates. Thanks.
Splendid! I'll leave that to you, then. Thanks :)
Splendid! I'll leave that to you, then. Thanks :)
Finally finished adding Related Works -> 'is source of' relationships in Zenodo for all published EOL resources. Zenodo. Thanks.
Steps: