PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
22 stars 6 forks source link

Support reading cell IDs from MEX datasets with cells.tsv.gz #1213

Open arteymix opened 2 weeks ago

arteymix commented 2 weeks ago

I've seen it a couple of times. It needs more investigation.

Example

Example from GSM7431296 bc_wells | sample | species | gene_count | tscp_count | mread_count | bc1_well | bc2_well | bc3_well | bc1_wind | bc2_wind | bc3_wind -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 01_01_07 | H10G | hg38 | 5390 | 30697 | 68925 | A1 | A1 | A7 | 1 | 1 | 7 01_01_61 | H10G | hg38 | 5333 | 17792 | 37604 | A1 | A1 | F1 | 1 | 1 | 61 01_01_73 | H10G | hg38 | 7621 | 38151 | 85604 | A1 | A1 | G1 | 1 | 1 | 73 01_01_77 | H10G | hg38 | 6047 | 21603 | 46507 | A1 | A1 | G5 | 1 | 1 | 77 01_01_93 | H10G | hg38 | 6407 | 23703 | 52339 | A1 | A1 | H9 | 1 | 1 | 93 01_02_13 | H10G | hg38 | 7692 | 40144 | 90362 | A1 | A2 | B1 | 1 | 2 | 13 01_02_48 | H10G | hg38 | 5588 | 18615 | 41831 | A1 | A2 | D12 | 1 | 2 | 48 01_02_72 | H10G | hg38 | 5764 | 18604 | 43937 | A1 | A2 | F12 | 1 | 2 | 72 01_02_90 | H10G | hg38 | 8780 | 54257 | 124186 | A1 | A2 | H6 | 1 | 2 | 90 01_03_32 | H10G | hg38 | 10113 | 76711 | 175626 | A1 | A3 | C8 | 1 | 3 | 32 01_03_49 | H10G | hg38 | 5899 | 19187 | 42449 | A1 | A3 | E1 | 1 | 3 | 49 01_04_01 | H10G | hg38 | 9473 | 64740 | 144988 | A1 | A4 | A1 | 1 | 4 | 1 01_04_21 | H10G | hg38 | 6235 | 24057 | 53825 | A1 | A4 | B9 | 1 | 4 | 21 01_04_29 | H10G | hg38 | 7186 | 28269 | 59569 | A1 | A4 | C5 | 1 | 4 | 29 01_04_66 | H10G | hg38 | 6829 | 27376 | 59401 | A1 | A4 | F6 | 1 | 4 | 66 01_04_82 | H10G | hg38 | 5537 | 19269 | 44030 | A1 | A4 | G10 | 1 | 4 | 82 01_05_03 | H10G | hg38 | 5523 | 17225 | 37633 | A1 | A5 | A3 | 1 | 5 | 3 01_05_05 | H10G | hg38 | 6542 | 19700 | 23385 | A1 | A5 | A5 | 1 | 5 | 5 01_05_12 | H10G | hg38 | 6767 | 28806 | 65188 | A1 | A5 | A12 | 1 | 5 | 12 01_05_44 | H10G | hg38 | 6989 | 46492 | 103139 | A1 | A5 | D8 | 1 | 5 | 44 01_05_65 | H10G | hg38 | 8671 | 45116 | 94556 | A1 | A5 | F5 | 1 | 5 | 65 01_05_83 | H10G | hg38 | 5223 | 15851 | 33218 | A1 | A5 | G11 | 1 | 5 | 83 01_05_90 | H10G | hg38 | 5542 | 19030 | 43234 | A1 | A5 | H6 | 1 | 5 | 90 01_07_05 | H10G | hg38 | 5655 | 14748 | 17480 | A1 | A7 | A5 | 1 | 7 | 5 01_07_12 | H10G | hg38 | 5857 | 19263 | 38110 | A1 | A7 | A12 | 1 | 7 | 12 01_07_13 | H10G | hg38 | 6826 | 27052 | 57082 | A1 | A7 | B1 | 1 | 7 | 13 01_07_26 | H10G | hg38 | 5651 | 16517 | 34259 | A1 | A7 | C2 | 1 | 7 | 26 01_07_27 | H10G | hg38 | 13816 | 241677 | 528427 | A1 | A7 | C3 | 1 | 7 | 27 01_07_40 | H10G | hg38 | 8449 | 46670 | 103642 | A1 | A7 | D4 | 1 | 7 | 40 01_07_41 | H10G | hg38 | 7006 | 27931 | 58183 | A1 | A7 | D5 | 1 | 7 | 41 01_07_49 | H10G | hg38 | 9521 | 68888 | 152430 | A1 | A7 | E1 | 1 | 7 | 49
arteymix commented 2 weeks ago

I've looked at other such files and they generally contain barcodes. We need to make sure we properly validate the file on import.