Open eliagbayani opened 3 months ago
taxon.tab.zip Hi @jhammock , Please see attached. I just noticed there are many rows in taxon.tab with these type of values:
taxonID | scientificName | kingdom | family | taxonRank | taxonomicStatus Phymatodes sp | Phymatodes | Archaeplastida | | genus |
Do we remove rows where taxonID has " sp" in it? Thanks.
Ah, well spotted. Looks like the records are meant for genus level records which we like to filter out. They won't be needed for parent-child relationships in this file.... Yes, please do remove them, along with any other records in occurrences and MoF that would connect.
Thanks!
Jen
On Mon, Apr 22, 2024 at 9:49 AM Eli Agbayani @.***> wrote:
taxon.tab.zip https://github.com/EOL/ContentImport/files/15063823/taxon.tab.zip Hi @jhammock https://github.com/jhammock , Please see attached. I just noticed there are many rows in taxon.tab with these type of values: taxonID scientificName kingdom family taxonRank taxonomicStatus Phymatodes sp Phymatodes Archaeplastida genus
Do we remove rows where taxonID has " sp" in it? Thanks.
— Reply to this email directly, view it on GitHub https://github.com/EOL/ContentImport/issues/7#issuecomment-2069510694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXC5B5RXQUYSV2ASH6IW2LY6UIONAVCNFSM6AAAAABGR3HG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRZGUYTANRZGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @jhammock , Do we want to add a URI in EOL Terms file for these measurementMethods, measurementValue and measurementType ?
[Missing measurementMethod] => Array
(
[literature and database review] => 432059
[http://www.nucleodiversus.org/index.php?mod=caracter&id=18] => 8587
[http://www.nucleodiversus.org/index.php?mod=caracter&id=22] => 8967
[http://www.nucleodiversus.org/index.php?mod=caracter&id=7] => 24289
[http://www.nucleodiversus.org/index.php?mod=caracter&id=46] => 24115
[http://www.nucleodiversus.org/index.php?mod=caracter&id=35] => 8678
)
[Missing measurementValue] => Array
(
[http://www.wikidata.org/entity/Q127498] => 175
)
[Missing measurementType] => Array
(
[http://eol.org/schema/terms/Habitat] => 30449
)
Thanks.
Oh, let me see... Some of these have been superceded.
Please replace the value
http://www.wikidata.org/entity/Q127498 with https://www.wikidata.org/entity/Q12806437
measurementType http://eol.org/schema/terms/Habitat with http://purl.obolibrary.org/obo/RO_0002303
The measurementMethods can stay as they are. Those aren't usually structured data. The urls refer to a publication whose methods were adopted by the authors of our dataset.
THanks!!
@jhammock Regarding the measurementMethods, when harvesting in eol.org it caused an error if the URL is not found in EOL Terms File.
[ERR] [2024-04-22 13:08:45] RuntimeError
[ERR] [2024-04-22 13:08:45] Missing Term for URI http://www.nucleodiversus.org/index.php?mod=caracter&id=7
, must be added!
[ERR] [2024-04-22 13:08:45] ../models/store/model_builder.rb:658:in `fail_on_bad_uri'
Oh, that's interesting... This is a "how did this setting get set this way" situation...
Here's an excerpt from the terms file:
The " is_text_only: true" item should cause all values of /measurementMethod to be treated as text. I can't think how it got removed, because I think we've been in this situation before, but I've put it back. I'll ask Jeremy for a terms update.
Thanks Jen! Will wait for Jeremy's update of our terms file.
OK, apparently terms updates are affected by our current adventures. He has tried, and would like us to test by attempting this publish :)
Status: TRY resource eventually got published. https://eol.org/resources/504 Looks good in the interface: https://www.eol.org/pages/647903/data?resource_id=504 https://www.eol.org/pages/11164930/data?resource_id=504
But reported to Jeremy that the published TRY resource still has Phymatodes. https://www.eol.org/pages/47173132/data?resource_id=504 Latest DwCA OpenData during harvest/publish doesn't have Phymatodes anymore.
Yes, the records look good to me. Gosh, there's a lot of pseudo-duplication in this resource. I might paw through it some day and figure out what we might do about that, but it's not really doing any harm. Anyway, I agree the publish looks successful, though the Phymatodes records, presumable ghost records, were not removed. Link for convenience: https://eol.org/pages/47173132/data?resource_id=504
Ooops! Random human error detected! I gave you the wrong url for "photoautotroph". Please replace http://purl.obolibrary.org/obo/ECOCORE_00000013 with http://purl.obolibrary.org/obo/ECOCORE_00000130 throughout this resource.
@jhammock , noted. Will soon also be able to check if ghost records will be removed finally. Thanks.
History: #################### Jennifer Hammock added a comment - 24/Jan/24 10:32 PM Another unrelated issue with this resource; forgive me for wedging it into this old ticket. A user has reported a homonym issue (Phymatodes). This is effectively a static resource and I was inclined to just delete the records for that taxon, as it is a genus name and was given quite specific size values. I see from the harvest page that the last attempted harvest of the resource was not straightforward: https://content.eol.org/resources/578 . It seems to be CKAN hosted so I cannot access it, but I think you can? If you can reach it, could you do a slight edit- find and remove six MoF records and (possibly also 6) occurrences that trace back to scientific name = Phymatodes? Then it's probably worth parking it on EOL Archive to try the harvest again. I don't remember what the problem was last harvest, but I'm hoping it was not a problem in the resource file, since it was successfully harvested. #################### Eli Agbayani added a comment - 25/Jan/24 12:49 AM - edited Hi Jen, I saw these 5 columns in MoF. 'lifeStage', 'bodyPart', 'meanlog10', 'SDlog10', 'SampleSize' I believe we want them moved elsewhere? I believe samplingSize goes to MoF as a record of its own with measurementOfTaxon = false. What about the other four? Anyway here is the beakdown: [lifeStage] => possible values and count
php update_resources/connectors/remove_MoF_fortaxonID.php '{"resource_id": "TRY_temp2", "resource": "remove_MoF_for_taxonID", "resource_name": "Try Database temp2"}'
php update_resources/connectors/resourceutility.php '{"resource_id": "try_dbase_2024_meta_recoded", "task": "metadata_recoding"}'
php update_resources/connectors/move_col_inMoF_2childinMoF.php '{"resource_id": "try_dbase_2024_meta_recoded", "resource": "move_MoF_col_2childMoF", "resource_name": "Try DB MoF update"}'
####################