Open fils opened 4 years ago
There are a few validation locations.
One is in acquire: acquire.go starting at line 110.
This simply checks if the JSON-LD is well formed. It has to be well formed to be usable. You can check your JSON-LD using https://json-ld.org/playground/ Simply put some example JSON-LD in there to ensure it is well formed JSON-LD.
This validation function (at line 177) checks 2 things actually 1) is the JSON-LD well formed such that it can be unmarshalled. 2) can the JSON-LD make proper RDF.. ie, for example, the IRIs formed by the JSON-LD are valid RDF.
Can you send any error you are getting such that I can see if this is the issue? If it is, I am fine putting in a flag for this in the config file and making this optional. Though really your JSON-ld should make it past this point to be usable by most JSON-LD tooling.
The use case URL is: https://www.ebi.ac.uk/biosamples/samples/SAMEA4088955
One key issue here is that Gleaner is looking for http://schema.org/DataSet and this JSON-LD doesn't have that. It is using "DataRecord" which is likely in the OBI or biosample namespace. However, the default context is set to schema.org and the @type is DataRecord. There is no such type in schema.org.
The URL is below. Note this error will also mean these will not show up in the Google Data Set search site either.
We have some guidance on type dataset at: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md
The biosample markup extends the schema from schema.org (not part of the current specification). More information can be found at: https://bioschemas.org
In an effort to scrape: https://www.ebi.ac.uk/biosamples/sitemap/599 Gleaner thrown the error: ebibiosamples 2m45s [--------------------------------------------------------------------] 100% panic: runtime error: index out of range [0] with length 0
goroutine 41 [running]: earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress.(Bar).Bytes(0xc00087c300, 0xc000066f60, 0xc00054bf10, 0x1) /home/fils/src/go/src/earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress/bar.go:195 +0x51f earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress.(Bar).String(...) /home/fils/src/go/src/earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress/bar.go:214 earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress.(Progress).print(0xc000066fc0) /home/fils/src/go/src/earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress/progress.go:127 +0xa5 earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress.(Progress).Listen(0xc000066fc0) /home/fils/src/go/src/earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress/progress.go:114 +0x49 created by earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress.(*Progress).Start /home/fils/src/go/src/earthcube.org/Project418/gleaner/vendor/github.com/gosuri/uiprogress/progress.go:134 +0x46
Is this relating to the validation with JSONLD or is it a different problem?
The run configuration is: gleaner summon: true mill: true
millers graph: true shacl: false
OS is windows 10
Thank you
Tasks is to scrape for biomedical markup data, the data extends the schema.org schema and therefore we don't need the validation that Gleaner does.