czbiohub-sf / tabula-muris-senis

Tabula Muris Senis
http://tabula-muris-senis.ds.czbiohub.org
BSD 3-Clause "New" or "Revised" License
93 stars 26 forks source link

Fastq annotations for Plate_seq fastq files #36

Open seunghooh5 opened 2 years ago

seunghooh5 commented 2 years ago

There is a fastq_annotated.csv file in s3://czb-tabula-muris-senis/Plate_seq/3_month/, which lists the samples and corresponding s3 URIs. However, there is no such file for other age groups - 18 and 21 and 24 months (Plate_seq/18_month/, Plate_seq/21_month, Plate_seq/24_month, respectively). So I tried to work around the problem by searching for the cell ids from tabula-muris-senis-facs-official-raw-obj__cell-metadata.csv in /metadata and compare the cell id against URI of all plate-seq retrieved through AWS cli command aws s3 ls s3://czb-tabula-muris-senis/Plate_seq/${month}_month/. Here, I confronted another problem that some cells have two different fastqs with the same cell ID, where only the super directory names differ. This is the case for the following example:

s3://czb-tabula-muris-senis/Plate_seq/3_month/170925_A00111_0066_AH3TKNDMXX/fastqs/A1-B000126-3_39_F-1-1_R1_001.fastq.gz
s3://czb-tabula-muris-senis/Plate_seq/3_month/170925_A00111_0066_AH3TKNDMXX__170925_A00111_0067_BH3M5YDMXX/fastqs/A1-B000126-3_39_F-1-1_R1_001.fastq.gz

Could you please provide the FASTQ annotation data?