Closed ygouin closed 1 month ago
Thanks, @ygouin! It the exact answer depends on the type of record. For the current release, we source data from:
X80994
)GB-GCA-003856275.1-RQSQ01000048.1
(though note we replace _
with -
)G001820395
MJ010-2-barcode60-umi1554bins-ubs-138
. All of those full length 16S are in the publicly available backbone artifact.
We additionally place 16S V4 fragments, and some full length records from SILVA separately. The 16S fragments come from Qiita, as obtained by redbiom. Those IDs are represented in three ways, either by ASV (e.g., TACAGAACCCCCGAGCGTTACCCGGATTTATTGGGCGTAAAGGGTCTGTAGGTGGTCACGTAAGTTTCAAGTTAAAGCTCTTCGGCTTAA
), by MD5 (e.g., 48700b14c9d20c30b0f575d3e01e0e5c
) or by an arbitrary 8-digit identifier (e.g., 22383138
). The SILVA records retain their original ID (e.g., JQIO01000430.1097739.1099039
)
With regards to coordinates, it depends on the exact data being considered. For the ASVs for example, they are all expected to be 16S V4, with many (hopefull all!) starting at 515F but samples included depend on accurate information from depositors. For the operons, I likely have the coordinates but would need to track them down -- I can do so if helpful. For LTP and GTDB, we use what is provided as is. For WoL, I think coordinates are provided in that resource but again I may be able to track down if needed.
I think that covers it. Please let me know if there are further questions!
Is there a way to get mapping/coordinate information about where the sequences are from?