Open KenSaville opened 4 years ago
There is a major problem with the entire book in that in subsequent months NCBI changed many of the formats, they themselves weren't sure what the most appropriate way to distribute data was. The concepts are valid, just the slight changes in the data make the code work differently.
The whole book will be rewritten in the next two months, using a new service by NCBI called datasets:
OK
I was able to work around it - just thought I'd point it out. I'll be using the handbook to teach a class starting in about a month.
I'm fine with running into issues here and there and then trying to figure it out, sharing the process with students.
That's half, if not more, of the battle.
Ken
On Mon, Sep 21, 2020 at 2:47 PM Istvan Albert notifications@github.com wrote:
There is a major problem with the entire book in that in subsequent months NCBI changed many of the formats, they themselves weren't sure what the most appropriate way to distribute data was. The concepts are valid, just the slight changes in the data make the code work differently.
The whole book will be rewritten in the next two months, using a new service by NCBI called datasets:
https://www.ncbi.nlm.nih.gov/datasets/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biostars/biostar-handbook-issues/issues/128#issuecomment-696301136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK6ZCVWU46PI7RXBJPTEVYTSG6NV7ANCNFSM4RUZOJLQ .
-- Ken Saville, PhD A.M. Chickering Professor Chair, Biology Department Albion College
The command
cat metadata.txt | grep Dec | grep complete | grep -v gapped | cut -f 1 > early.ids
returns no lines from the metadata.txt file
I believe it's because the dates are in numeric form
grepping 2019 may solve the problem, but wouldn't if there were sequences from other months in 2019