EOL / ContentImport

A placeholder for DATA tickets everytime Jira is un-available.
1 stars 1 forks source link

TRY database adjustments #7

Open eliagbayani opened 3 months ago

eliagbayani commented 3 months ago

History: #################### Jennifer Hammock added a comment - 24/Jan/24 10:32 PM Another unrelated issue with this resource; forgive me for wedging it into this old ticket. A user has reported a homonym issue (Phymatodes). This is effectively a static resource and I was inclined to just delete the records for that taxon, as it is a genus name and was given quite specific size values. I see from the harvest page that the last attempted harvest of the resource was not straightforward: https://content.eol.org/resources/578 . It seems to be CKAN hosted so I cannot access it, but I think you can? If you can reach it, could you do a slight edit- find and remove six MoF records and (possibly also 6) occurrences that trace back to scientific name = Phymatodes? Then it's probably worth parking it on EOL Archive to try the harvest again. I don't remember what the problem was last harvest, but I'm hoping it was not a problem in the resource file, since it was successfully harvested. #################### Eli Agbayani added a comment - 25/Jan/24 12:49 AM - edited Hi Jen, I saw these 5 columns in MoF. 'lifeStage', 'bodyPart', 'meanlog10', 'SDlog10', 'SampleSize' I believe we want them moved elsewhere? I believe samplingSize goes to MoF as a record of its own with measurementOfTaxon = false. What about the other four? Anyway here is the beakdown: [lifeStage] => possible values and count

php update_resources/connectors/remove_MoF_fortaxonID.php '{"resource_id": "TRY_temp2", "resource": "remove_MoF_for_taxonID", "resource_name": "Try Database temp2"}'

php update_resources/connectors/resourceutility.php '{"resource_id": "try_dbase_2024_meta_recoded", "task": "metadata_recoding"}'

php update_resources/connectors/move_col_inMoF_2childinMoF.php '{"resource_id": "try_dbase_2024_meta_recoded", "resource": "move_MoF_col_2childMoF", "resource_name": "Try DB MoF update"}'

####################

eliagbayani commented 3 months ago

taxon.tab.zip Hi @jhammock , Please see attached. I just noticed there are many rows in taxon.tab with these type of values:

taxonID | scientificName | kingdom | family | taxonRank | taxonomicStatus Phymatodes sp | Phymatodes | Archaeplastida | | genus |

Do we remove rows where taxonID has " sp" in it? Thanks.

jhammock commented 3 months ago

Ah, well spotted. Looks like the records are meant for genus level records which we like to filter out. They won't be needed for parent-child relationships in this file.... Yes, please do remove them, along with any other records in occurrences and MoF that would connect.

Thanks!

Jen

On Mon, Apr 22, 2024 at 9:49 AM Eli Agbayani @.***> wrote:

taxon.tab.zip https://github.com/EOL/ContentImport/files/15063823/taxon.tab.zip Hi @jhammock https://github.com/jhammock , Please see attached. I just noticed there are many rows in taxon.tab with these type of values: taxonID scientificName kingdom family taxonRank taxonomicStatus Phymatodes sp Phymatodes Archaeplastida genus

Do we remove rows where taxonID has " sp" in it? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/EOL/ContentImport/issues/7#issuecomment-2069510694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXC5B5RXQUYSV2ASH6IW2LY6UIONAVCNFSM6AAAAABGR3HG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRZGUYTANRZGQ . You are receiving this because you were mentioned.Message ID: @.***>

eliagbayani commented 3 months ago

Hi @jhammock , Do we want to add a URI in EOL Terms file for these measurementMethods, measurementValue and measurementType ?

[Missing measurementMethod] => Array
    (
        [literature and database review] => 432059
        [http://www.nucleodiversus.org/index.php?mod=caracter&id=18] => 8587
        [http://www.nucleodiversus.org/index.php?mod=caracter&id=22] => 8967
        [http://www.nucleodiversus.org/index.php?mod=caracter&id=7] => 24289
        [http://www.nucleodiversus.org/index.php?mod=caracter&id=46] => 24115
        [http://www.nucleodiversus.org/index.php?mod=caracter&id=35] => 8678
    )

[Missing measurementValue] => Array
    (
        [http://www.wikidata.org/entity/Q127498] => 175
    )

[Missing measurementType] => Array
    (
        [http://eol.org/schema/terms/Habitat] => 30449
    )

Thanks.

jhammock commented 3 months ago

Oh, let me see... Some of these have been superceded.

Please replace the value

http://www.wikidata.org/entity/Q127498 with https://www.wikidata.org/entity/Q12806437

measurementType http://eol.org/schema/terms/Habitat with http://purl.obolibrary.org/obo/RO_0002303

The measurementMethods can stay as they are. Those aren't usually structured data. The urls refer to a publication whose methods were adopted by the authors of our dataset.

THanks!!

eliagbayani commented 3 months ago

@jhammock Regarding the measurementMethods, when harvesting in eol.org it caused an error if the URL is not found in EOL Terms File.

[ERR] [2024-04-22 13:08:45] RuntimeError [ERR] [2024-04-22 13:08:45] Missing Term for URI http://www.nucleodiversus.org/index.php?mod=caracter&id=7, must be added! [ERR] [2024-04-22 13:08:45] ../models/store/model_builder.rb:658:in `fail_on_bad_uri'

jhammock commented 3 months ago

Oh, that's interesting... This is a "how did this setting get set this way" situation...

Here's an excerpt from the terms file:

The " is_text_only: true" item should cause all values of /measurementMethod to be treated as text. I can't think how it got removed, because I think we've been in this situation before, but I've put it back. I'll ask Jeremy for a terms update.

eliagbayani commented 3 months ago

Thanks Jen! Will wait for Jeremy's update of our terms file.

jhammock commented 3 months ago

OK, apparently terms updates are affected by our current adventures. He has tried, and would like us to test by attempting this publish :)

eliagbayani commented 2 months ago

Status: TRY resource eventually got published. https://eol.org/resources/504 Looks good in the interface: https://www.eol.org/pages/647903/data?resource_id=504 https://www.eol.org/pages/11164930/data?resource_id=504

But reported to Jeremy that the published TRY resource still has Phymatodes. https://www.eol.org/pages/47173132/data?resource_id=504 Latest DwCA OpenData during harvest/publish doesn't have Phymatodes anymore.

jhammock commented 2 months ago

Yes, the records look good to me. Gosh, there's a lot of pseudo-duplication in this resource. I might paw through it some day and figure out what we might do about that, but it's not really doing any harm. Anyway, I agree the publish looks successful, though the Phymatodes records, presumable ghost records, were not removed. Link for convenience: https://eol.org/pages/47173132/data?resource_id=504

jhammock commented 2 months ago

Ooops! Random human error detected! I gave you the wrong url for "photoautotroph". Please replace http://purl.obolibrary.org/obo/ECOCORE_00000013 with http://purl.obolibrary.org/obo/ECOCORE_00000130 throughout this resource.

eliagbayani commented 2 months ago

@jhammock , noted. Will soon also be able to check if ghost records will be removed finally. Thanks.