EOL / tramea

A lightweight server for denormalized EOL data
Other
2 stars 1 forks source link

Mysteriously broken/invalid TraitBank Resources #100

Open jhammock opened 8 years ago

jhammock commented 8 years ago

This is all wrapped up except for a suite of resources of typeSpecimenRepository records. Relevant comments start at April 21

jhammock commented 8 years ago

A data owner has asked about this.

(@jhammock should check his resource when this is completed. Resource page is not visible now but collection is)

jhammock commented 8 years ago

Port Found Zero Traits

http://eol.org/content_partners/4/resources/872 http://eol.org/content_partners/4/resources/886 http://eol.org/content_partners/4/resources/887 http://eol.org/content_partners/4/resources/892 http://eol.org/content_partners/4/resources/893 http://eol.org/content_partners/4/resources/894

Traits Seem To Be Missing

http://eol.org/content_partners/695/resources/774 http://eol.org/content_partners/4/resources/820 http://eol.org/content_partners/604/resources/871 http://eol.org/content_partners/557/resources/750 http://eol.org/content_partners/704/resources/793 http://eol.org/content_partners/709/resources/799 some metadata missing http://eol.org/content_partners/709/resources/804 http://eol.org/content_partners/129/resources/879

Intentionally Skipping

Skip this one (skip we always meant to delete this one; it duplicates GloBI data) http://eol.org/content_partners/690/resources/765 (Uncertain. This is a mixed resource. I can see images and text but haven't found any trait data from sampling a few taxa)

JRice commented 8 years ago

NOTE: this is going to take a while. :|

The problem is the stupid old-TB query for metadata; it takes a LONG time to get those queries; average over 2 seconds for each one. Ridiculous.

I could re-write it to be more effective, but I have other things to prioritize right now... if I get through those, I will. It'll speed things up in the future, too (we still need to run this command after harvesting data, ATM).

jhammock commented 8 years ago

It may be that Old TraitBank contains a number of unwanted stepchildren that should not be ported over. Most of these are unpublished (you're probably excluding those already), but there may be a few accidentally or ill-advisedly published. http://eol.org/content_partners/611/resources/664 is an example of this. If practical, that should be taken down; it is an early draft. http://eol.org/content_partners/636/resources/712 is another example; if that can be excluded from the migration, that would be great.

Also if practical- could we have a list of published TB resources that are not listed above? I could check them after porting (I think the resource pages won't be visible before?) and request deletion of any that should be deleted.

JRice commented 8 years ago

Code to get a few of these going (by the resource IDs, over there on the left, which you must choose):

[947, 774, 775, 820, 871, 725, 770].each { |id| TraitBank::ResourcePorter.port(Resource.find(id)) }
JRice commented 8 years ago

Next set:

[713,828, 765,776,750,791,793,800,780,799,736,814,739,715,827,804,726, 727]

Skips IRMNG and 712.

JRice commented 8 years ago

I had to skip GloBI. 1,563,806 traits... that would take over six weeks to run. I'm going to have to try it after re-writing the code.

JRice commented 8 years ago

For posterity, these seem to be done:

done: 1164 traits http://eol.org/content_partners/743/resources/900 done: 447704 traits http://eol.org/content_partners/196/resources/891 done: 4332 traits http://eol.org/content_partners/684/resources/749 done http://eol.org/content_partners/740/resources/885 done http://eol.org/content_partners/381/resources/947 done http://eol.org/content_partners/695/resources/775 done http://eol.org/content_partners/196/resources/725 done http://eol.org/content_partners/693/resources/770 done http://eol.org/content_partners/709/resources/828 done http://eol.org/content_partners/696/resources/776 done http://eol.org/content_partners/703/resources/791 done http://eol.org/content_partners/713/resources/800 done http://eol.org/content_partners/698/resources/780 done http://eol.org/content_partners/641/resources/736 done http://eol.org/content_partners/604/resources/814 done http://eol.org/content_partners/678/resources/739 done http://eol.org/content_partners/664/resources/715 done http://eol.org/content_partners/724/resources/827 done http://eol.org/content_partners/499/resources/726 done http://eol.org/content_partners/36/resources/727

jhammock commented 8 years ago

For http://eol.org/content_partners/709/resources/799:

some records are present, eg: http://eol.org/pages/45756646/data#data_point_35110874 http://eol.org/pages/45756645/data#data_point_35124346 http://eol.org/pages/45756644/data#data_point_35124345

others are missing, eg:

http://eol.org/pages/3053347/data http://eol.org/pages/15024/data

And, many or most taxa in the collection are inaccessible (unpublished) and a lot of them look like previous "bad merge" genus names- so maybe this collection just needs a reindex or something...

JRice commented 8 years ago

GloBI finished last night. It should be checked.

JRice commented 8 years ago

IRMNG finished yesterday. It should be checked.

jhammock commented 8 years ago

IRMNG includes a few oddities:

taxa in the collection that do not show IRMNG in the names tab, eg: (from collection first page) http://eol.org/pages/57668 http://eol.org/pages/89969

Most pages have data, but there are some pages without

A number of pages with NPV appended at the end of the name, which (appropriately) did not merge to the corresponding NPV free name, do not display the NPV in the title of their page, example. This changes the title from the name of a virus to the name of its host (host also often listed in IRMNG)

There are a some number of pages that have a weird mix of substring matches evident in the names tab. I think there aren't many, but they tend to sort to the front, so there are a lot on the first page. IRMNG could represent the genus entry, the substring matched entry (often the species epithet = the genus string and there may be several binomials like this), or IRMNG might be missing from that page

I suspect that last thing is related to https://github.com/EOL/tramea/issues/220 and is not IRMNG specific, but IRMNG is big and it's hard to find outside examples.

JRice commented 8 years ago

Okay, unsurprisingly, there are no traits for http://eol.org/pages/327361 in the New TB.

I don't see any traits in the Old TB for that page, either, but that is admittedly harder to nail down...

Ahhh, but it's not in the mappings graph for that resource: SELECT * { GRAPH <http://eol.org/resources/741/mappings> { ?s ?p <http://eol.org/pages/57668> } } LIMIT 1 is empty. So is pages/89969, whereas a page that's known to have one, e.g. pages/17862 does return a trait.

So the problem here is that damn collection! :) You're looking at the items in the collection as if they imply that the page should have data, when in fact it just means that the page was in the resource file... whether it had any content or none at all! If it had nothing more than a name, it'll be there. ...This is one of the many reasons I hate the resource collection. :D

Anyway, all this means so far is that there wasn't a problem with the data port. There could have been a problem with the harvest (using PHP)...

...So how do we know which pages should have traits? ...There is currently know what to know that through the UI. In fact, the only way I can think of (other than the Sparql queries I mention here, but that's a bit biased and isn't open to SPG) is by looking at the resource file... which I'll do now for these guys. It's not easy—you have to find (on the names tab) the exact scientific name used by the resource for the page, then look that up in the occurrence tab, then see if that taxon ID is used in the measurements ... but in this case, I don't even see IRMNG on the names tab!

Stupid collection. There must be something really wrong with it. (FWIW: recall that this was a port, not a harvest, so the collection was created long ago, using PHP, and it could have any number of problems.)

Yup, confirming with X-Ray vision that pages/57668 doesn't have an entry for IRMNG:

EOL Group on Flickr 114
Initial BioLib.cz Import 394
AquaMaps Resource 467
OBIS depth range resource 556
Ocean Genome Resource- available taxa 11/17/2010 601
DiscoverLife resource 647
Wikipedia 431
Freshwater and Marine Image Bank, University Libraries, U Washington 925
BOLD Systems Resource 428
Taxonomic Hierarchy of COL-China 2012 1139
NCBI Taxonomy 1172
Species 2000 & ITIS Catalogue of Life: April 2013 1188
OBIS Environmental Information 1307
McClain Bivalve Sizes 1314
Integrated Taxonomic Information System (ITIS) 903
Algeabase resource 1280
Inventaire National du Patrimoine Naturel 1388
Paleobiology Database 967
Environments 1317
Phthiraptera 1471
Smithsonian type specimen data 1484
JRice commented 8 years ago

...Hmmn... Looking at the resource file, the lack of an IRMNG entry for Malleus is actually a little surprising. The resource does identify Malleus:

1357798 104977  Malleus         Mém. Soc. H. N. Paris, 82.      Malleidae               genus   Lamarck, 1799   valid   Authority cited elsewhere as Lam., 1799.

Plot thickens. ...So there should be an entry for it, but there is not. Odd.

Further, it ALSO looks like there are indeed occurrences:

1357798_cs      1357798
1357798_h       1357798
1357798_cs      true    http://eol.org/schema/terms/ExtinctionStatus    http://eol.org/schema/terms/extant      http://www.marine.csiro.au/m
irrorsearch/ir_search.list_species?gen_id=1357798
1357798_h       true    http://eol.org/schema/terms/Habitat     http://purl.obolibrary.org/obo/ENVO_00000569    http://www.marine.csiro.au/m
irrorsearch/ir_search.list_species?gen_id=1357798
JRice commented 8 years ago

Okay, so what happened is that the entry was unpublished:

concept = TaxonConcept.find(57668)
resource = Resource.find(741)
h_id = resource.hierarchy.id
=> 1347
concept.hierarchy_entries.find { |e| e.hierarchy_id = h_id }
=> #<HierarchyEntry id: 20272431, guid: "", identifier: "79609", source_url: "", name_id: 510423, parent_id: 20272430, hierarchy_id: 1347, rank_id: 118, ancestry: "230572|230454|425116|510192|425124|510422", lft: 181916, rgt: 181919, depth: 6, taxon_concept_id: 57668, vetted_id: 5, published: 0, visibility_id: 0, created_at: nil, updated_at: nil, taxon_remarks: nil>

...Why that happened? I cannot say. ...But it clearly affected which data were put into TraitBank, before the port. So all I can say about this is, unhelpfully, "it's not my fault!" :( A re-harvest would probably fix this ... and if it didn't, it would at least shed more light on how we ended up in this weird state.

JRice commented 8 years ago

...Actually, I can't help but notice that the ID on that unpublished entry is wrong. 79609 is, in fact, not even in the taxon.tab file at all.

[shrug] That doesn't add anything, but it suggests that perhaps the entry we have was from a harvest of an older version of the file. More disturbing, there is no record of that entry having ever been harvested. ...Which is, I suppose, possible, if this were a really, really old harvest, but with a resource ID of 741, I don't see how that's possible (if it were, say, <200, then maybe)...

HarvestEventsHierarchyEntry.where(hierarchy_entry_id: 20272431)
  HarvestEventsHierarchyEntry Load (1.6ms)  SELECT `harvest_events_hierarchy_entries`.* FROM `harvest_events_hierarchy_entries` WHERE `harvest_events_hierarchy_entries`.`hierarchy_entry_id` = 20272431
=> []

ANYWAY... this leaves us with the problem of having an Unreliable collection that we're checking results from. I really don't know a good way to solve that problem. :(

JRice commented 8 years ago

Most pages have data, but there are some pages without

Bah, getting bitten by the fact that this IRMNG resource is actually named "Extant & Habitat resource" with no "IRMNG" in it. :S I'll update my code to display both, sigh... tap tap Okay, so we have these:

IRMNG Extant & Habitat resource 1347

Okay, that's all well and good. ...But do we have traits for these in the old TraitBank? In fact, no. All we have for this page is:

http://rs.tdwg.org/ontology/voc/SPMInfoItems#Distribution = North America - United States - Minnesota (http://eol.org/resources/218) http://rs.tdwg.org/ontology/voc/SPMInfoItems#Distribution = North America - United States (http://eol.org/resources/218)

...Now... it could be that the problem is a change of taxon concept id. Meaning: the IRMNG data used to be on page 1234, but that page has, since we harvested IRMNG to old TB, moved to page 678. ...That's entirely possible, and there's little we can do about that: the data are lost. :( ...Thinking about that, though, I think I could avoid that happening again... but it would be tricky.

Hmmmn. That having been said, I see that there were several pages that merged to this one: [2468557, 6165691, 11161286, 16895665, 19850177, 22893729]. That's what happened. There's a bunch of data on that page, too:

11161286

  http://purl.obolibrary.org/obo/TO_0000207 = 6 (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0000207 = 9 (http://eol.org/resources/727)
  http://rs.tdwg.org/dwc/terms/verbatimElevation = 100.7 (http://eol.org/resources/820)
  http://rs.tdwg.org/dwc/terms/verbatimElevation = 143.2 (http://eol.org/resources/820)
  http://rs.tdwg.org/dwc/terms/verbatimElevation = 752.1 (http://eol.org/resources/820)
  http://rs.tdwg.org/dwc/terms/verbatimElevation = 0.0 (http://eol.org/resources/820)
  http://eol.org/schema/terms/FrostFreeDays = 88 (http://eol.org/resources/727)
  http://eol.org/schema/terms/PlantingDensity = 320 (http://eol.org/resources/727)
  http://eol.org/schema/terms/PlantingDensity = 1280 (http://eol.org/resources/727)
  http://eol.org/schema/terms/PrecipitationTolerance = 10 (http://eol.org/resources/727)
  http://eol.org/schema/terms/PrecipitationTolerance = 104 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilDepth = 12 (http://eol.org/resources/727)
  http://eol.org/schema/terms/TemperatureTolerance = -62 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilPH = 7.5 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilPH = 5 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SeedPerPound = 270000 (http://eol.org/resources/727)
  http://eol.org/schema/terms/NativeRange = United States (USA) (http://eol.org/resources/750)
  http://eol.org/schema/terms/PlantHabit = http://eol.org/schema/terms/shrub (http://eol.org/resources/750)
  http://eol.org/schema/terms/ExtinctionStatus = http://eol.org/schema/terms/extant (http://eol.org/resources/741)
  http://purl.bioontology.org/ontology/SNOMEDCT/260865002 = http://eol.org/schema/terms/moderateRate (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0000624 = http://eol.org/schema/terms/allelopathyUnknown (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/PATO_0000050 = http://purl.obolibrary.org/obo/PATO_0001604 (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/PATO_0001729 = http://purl.obolibrary.org/obo/PATO_0001731 (http://eol.org/resources/727)
  http://eol.org/schema/terms/GrassGrowthType = http://eol.org/schema/terms/lowGrowingGrassNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/ResproutAbility = http://eol.org/schema/terms/ResproutYes (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/GO_0009399 = http://purl.bioontology.org/ontology/SNOMEDCT/260413007 (http://eol.org/resources/727)
  http://eol.org/schema/terms/HumanLivestockToxicity = http://purl.obolibrary.org/obo/PATO_0000394 (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/PATO_0000052 = http://purl.obolibrary.org/obo/PATO_0000622 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilRequirements = http://eol.org/schema/terms/coarseSoilNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilRequirements = http://eol.org/schema/terms/mediumSoilNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/SoilRequirements = http://eol.org/schema/terms/fineSoilNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/GerminationRequirements = http://eol.org/schema/terms/coldStratificationYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PrimaryMacronutrientRequirements = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C49507 (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0000276 = http://purl.obolibrary.org/obo/PATO_0002393 (http://eol.org/resources/727)
  http://eol.org/schema/terms/FireTolerance = http://purl.obolibrary.org/obo/PATO_0000461 (http://eol.org/resources/727)
  http://eol.org/schema/terms/HedgeTolerance = http://purl.obolibrary.org/obo/PATO_0002394 (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0006001 = http://purl.bioontology.org/ontology/SNOMEDCT/260413007 (http://eol.org/resources/727)
  http://eol.org/schema/terms/ShadeTolerance = http://purl.obolibrary.org/obo/PATO_0002393 (http://eol.org/resources/727)
  http://eol.org/schema/terms/BloomPeriod = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94731 (http://eol.org/resources/727)
  http://eol.org/schema/terms/CommercialAvailability = http://eol.org/schema/terms/routinelyAvailable (http://eol.org/resources/727)
  http://eol.org/schema/terms/SeedPeriodBegin = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94732 (http://eol.org/resources/727)
  http://eol.org/schema/terms/SeedPeriodEnd = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94732 (http://eol.org/resources/727)
  http://eol.org/schema/terms/FruitPersistence = http://eol.org/schema/terms/fruitPersistentYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByBareRootYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByBulbsNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByContainerYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByCormsNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByCuttingsYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedBySeedYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedBySodNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedBySprigsYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/PropagationMethod = http://eol.org/schema/terms/propagatedByTubersNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/SeedlingSurvival = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C25227 (http://eol.org/resources/727)
  http://eol.org/schema/terms/GrainType = http://eol.org/schema/terms/smallGrainNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/berryNutSeedYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/christmasTreeNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/fodderNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/lumberNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/navalStoreNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/nurseryStockYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/BrowseAnimalPalatability = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C54722 (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/palatableHumansYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/GrazeAnimalPalatability = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C54722 (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/postProductNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/pulpwoodNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/Uses = http://eol.org/schema/terms/veneerNo (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0002725 = http://eol.org/schema/terms/perennial (http://eol.org/resources/727)
  http://eol.org/schema/terms/PlantHabit = http://eol.org/schema/terms/subshrub (http://eol.org/resources/727)
  http://eol.org/schema/terms/NativeRange = St. Pierre and Miquelon (France) (http://eol.org/resources/727)
  http://eol.org/schema/terms/NativeIntroducedRange = Canada (http://eol.org/resources/727)
  http://eol.org/schema/terms/NativeIntroducedRange = Alaska, USA (http://eol.org/resources/727)
  http://eol.org/schema/terms/NativeIntroducedRange = Lower 48 United States of America (http://eol.org/resources/727)
  http://eol.org/schema/terms/ActiveGrowthPeriod = http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94731 (http://eol.org/resources/727)
  http://eol.org/schema/terms/BloatPotential = http://purl.bioontology.org/ontology/SNOMEDCT/260413007 (http://eol.org/resources/727)
  http://eol.org/schema/terms/CNRatio = http://eol.org/schema/terms/mediumCNRatio (http://eol.org/resources/727)
  http://sweet.jpl.nasa.gov/2.3/humanAgriculture.owl#Horticulture = http://eol.org/schema/terms/coppicePotentialNo (http://eol.org/resources/727)
  http://eol.org/schema/terms/FireResistance = http://eol.org/schema/terms/fireResistantYes (http://eol.org/resources/727)
  http://sweet.jpl.nasa.gov/2.3/humanAgriculture.owl#Horticulture = http://eol.org/schema/terms/fallConspicuousNo (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0000537 = http://purl.obolibrary.org/obo/PATO_0000323 (http://eol.org/resources/727)
  http://sweet.jpl.nasa.gov/2.3/humanAgriculture.owl#Horticulture = http://eol.org/schema/terms/flowerConspicuousNo (http://eol.org/resources/727)
  http://purl.obolibrary.org/obo/TO_0000326 = http://purl.obolibrary.org/obo/PATO_0000320 (http://eol.org/resources/727)
  http://eol.org/schema/terms/FoliagePorositySummer = http://eol.org/schema/terms/moderatePorosity (http://eol.org/resources/727)
  http://eol.org/schema/terms/FoliagePorosityWinter = http://purl.obolibrary.org/obo/PATO_0000984 (http://eol.org/resources/727)
  http://eol.org/schema/terms/FoliageTexture = http://purl.obolibrary.org/obo/PATO_0000700 (http://eol.org/resources/727)
  http://eol.org/schema/terms/FruitSeedColor = http://purl.obolibrary.org/obo/PATO_0000322 (http://eol.org/resources/727)
  http://eol.org/schema/terms/PrimaryGrowthForm = http://eol.org/schema/terms/thicketForming (http://eol.org/resources/727)
  http://sweet.jpl.nasa.gov/2.3/humanAgriculture.owl#Horticulture = http://eol.org/schema/terms/fruitSeedConspicuousYes (http://eol.org/resources/727)
  http://eol.org/schema/terms/NonInvasiveRange = United States (USA) (http://eol.org/resources/750)
  http://eol.org/schema/terms/ExtinctionStatus = http://eol.org/schema/terms/extant (http://eol.org/resources/741)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00002248 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00001998 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000572 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000300 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000176 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000176 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00001998 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00002982 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00002258 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_01000017 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000194 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000194 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000142 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000087 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000029 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000444 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000109 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000182 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000047 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00001998 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000109 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000360 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000020 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000111 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000043 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000303 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000255 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000020 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000261 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000300 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000109 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_01000196 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000111 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000260 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000106 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000108 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000300 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000081 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000301 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_01000206 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000086 (http://eol.org/resources/708)
  http://rs.tdwg.org/dwc/terms/habitat = http://purl.obolibrary.org/obo/ENVO_00000182 (http://eol.org/resources/708)
  http://eol.org/schema/terms/Habitat = http://eol.org/schema/terms/nonMarine (http://eol.org/resources/741)
  http://eol.org/schema/terms/Habitat = http://eol.org/schema/terms/nonMarine (http://eol.org/resources/741)
JRice commented 8 years ago

It looks like I really have to add some code to look for taxon supercedure. :S

For brevity, here's what was from IRMNG:

  http://eol.org/schema/terms/ExtinctionStatus = http://eol.org/schema/terms/extant ()
  http://eol.org/schema/terms/ExtinctionStatus = http://eol.org/schema/terms/extant ()
  http://eol.org/schema/terms/Habitat = http://eol.org/schema/terms/nonMarine ()
  http://eol.org/schema/terms/Habitat = http://eol.org/schema/terms/nonMarine ()
JRice commented 8 years ago

GBIF national node type records: Germany - There's 20,137 triples (for 927 traits) in the old TraitBank to port, but it doesn't show up with the query I'm using to find measurements.

...After some digging (not easy, grrr), the problem appears to be a lack of dwc:taxonID triples. (There are none in this graph.) In fact, the occurrences have no triples where they are the subject (meaning they don't map to anything at all).

Again: this is a problem in the OLD TraitBank, so whatever is causing this is either a PHP problem or a resource file problem.

Given that all of the other resources that didn't work are also GBIF, I'm guessing it's a problem in the resource file.

JRice commented 8 years ago

I don't see anything wrong with the file. :S It's using the right term, it's index is not out of place, IDs appear to match one to the other... The IDs are a bit long, but I doubt that's a problem. ...I suppose I should check, though. Update: nah, nothing looks like it limits the length of identifiers. There's plenty of code that says "skip this if the taxon ID hasn't been seen," but they really should have been seen. I'm a bit baffled at why this is corrupt. :|

JRice commented 8 years ago

Nope, I can't figure this out. I'll need another pair of eyes.

Specifically, for the next pair of eyes: resource 872 did not create the proper set of triples when it was imported. Everything is there except for the triples of the form <occurrence> dwc:taxonID <taxon>. ...I can't figure out why these are missing. :(

eliagbayani commented 8 years ago

@JRice , @jhammock , I've just uploaded a small subset of the Germany resource. Did one fix after comparing it with other successful TB resources. Hard to pickup any difference but I noticed the missing measurementID in the measurement extension. Maybe a re-harvest, process of elimination. Thanks.

jhammock commented 8 years ago

I wonder... I can find two TB resources harvested without Measurement IDs, but those were both update re-harvests and I honestly can't remember where to look for the revisions in them.

However, we do know that TB data without Measurement IDs do port, in other resources. We just don't have an example of a successful harvest + port.

JRice commented 8 years ago

Indeed, missing Measurement IDs are not a problem. Still need another dev, though.

jhammock commented 8 years ago

Re tested one of the "disappearing traits" resources (892) with a resource file generated through the new spreadsheet converter; still no traits. Something contained in the records is responsible. The taxa appear in good order, so presumably something in occurrences or measurementsOrFacts. I wonder if there's a character limit on occurrence ID? They're 42 characters long.

eliagbayani commented 8 years ago

Hi @jhammock, Problem maybe due to bad character encoding and maybe MySQL doesn't accept those records. Can you give me a copy of the spreadsheet you used and let me try checking the characters used. Thanks.

jhammock commented 8 years ago

The excel file is here: https://www.dropbox.com/s/snh8fzuc5zkginm/GBIF%20Brazil%20test.xls?dl=0

Thanks!

eliagbayani commented 8 years ago

Solution to missing TraitBank data remains elusive. As records from the archive file CAN be appended to MySQL. So nothing wrong with the characters nor encoding. Still no solution for now :-(