NCBI-Hackathons / EDirectCookbook

MIT License
159 stars 53 forks source link

Formatting and missing value issue in xtract metadata from bioproject #26

Closed ghost closed 6 years ago

ghost commented 6 years ago

I am trying to extract metadata from a list of Bioprojects Ids. The script I have works fine, but it cannot deal well with missing data. I read that in the xtract help that I could use the flag -def as Default placeholder for missing fields. So I tried to add the -def "NA" flag, but it does not change anything in the output.

This is my code:

for i in $(cat $A); 
    do ll=$(esearch -db bioproject -query $i | 
    efetch -format xml |
    xtract -pattern DocumentSummary -element \
    $G,$S,$M,$O,$OT,$TR,$H,$Sa,$BR,$Tro,$Sco,$Org,$E,$Otr,\
    $Phen,$Dis -def "Na" )
    echo -e "$i\t$ll" >> $B;
    done

This s my output:

PRJNA310173 PRJNA253675 eNegative eBacilli eNo eAerobic eMesophilic eHostAssociated Tularemia

This is what I would like to have:

PRJNA310173 Na Na Na Na Na Na Na PRJNA253675 eNegative eBacilli eNo eAerobic eMesophilic eHostAssociated Tularemia

ghost commented 6 years ago

Well I guess it was quite a trivial problem. I simply added -def Na before the list of elements.