Open zedomel opened 3 years ago
@zedomel thank you for sharing your idea to improve elton by allowing to extract interactions from DarwinCore archives without first having to make a globi.json configuration file via elton init ...
(or manually).
I need a little time to think about how to implement this. Hoping to get back to this sooner rather than later.
I also much like the other (unrelated) activities you mentioned - importing https://deeplinker.bio into AWS EMR and querying the values from associatedTaxa, associatedOccurrences and resource relationship tables.
I've created separate issues for these in the preston repository:
https://github.com/bio-guoda/preston/issues/114 -> AWS EMR
https://github.com/bio-guoda/preston/issues/115 -> querying for specific values in associatedTaxa, associatedOccurrences, resource relationships
Do you mind continuing the conversation about these neat feature ideas / activities there?
Hi @jhpoelen
following what we have discussed about indexing biotic interactions from GBIF, I have some questions which may demand adding new features do
elton
. Let's see.I'm using the following command to get all DwC-Archives from
deeplinerk.bio
:It gives me a list of URL's of DwC-Archives:
Now, I trying to extract interaction data from these archives using
elton
:c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55
is the hash of the latest bio graph.But the way which
elton
works (as far I know) I need runelton init
before passing the--data-url
and--data-citation
to create theglobi.json
file and then setformat: "dwca"
.I'm wondering if there is some way to skip
elton init
and useelton interactions
to extract all interactions from the dwca's.Maybe then I can do something like
the
awk -F\t '$27!=""'
is appended to the command in order to get only "complete" interactions records, sinceelton
will output records with emptytargetTaxonName
(field number 27) when it can't find any interactions (the DwC-A doest not contains any data forassociatedTaxa
for example).In parallel, I'm editing the scripts in https://github.com/bio-guoda/preston-scripts to store these DwC-A into a AWS EMR facilty.
Additionally, for the kind of analysis that I'm trying to do, will be interest to know in which DwC fields the interactions are stored (
associatedTaxa
,associatedOccurrence
,ResourceRelationship
). Is there any way to get that information too?thanks.