YaraAlshw / finalproject

0 stars 0 forks source link

failed downlaods #1

Closed caseywdunn closed 3 years ago

caseywdunn commented 4 years ago

What was the specific problem that you had downloading the data to the cluster?

How big do you expect these files are? It may be worth downloading them locally and then transferring them to the cluster.

YaraAlshw commented 3 years ago

Here is a link to the subset of the samples I downloaded: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?exp=SRX6399249+SRX6399250+SRX6399251+SRX6399252+SRX6399253+SRX6399255+SRX6399256+SRX6399259+SRX6399260+SRX6399261+SRX6399262+SRX6399263+SRX6399264+SRX6399265+SRX6399266+SRX6399267+SRX6399268+SRX6399269+SRX6399270+SRX6399271+SRX6399272+SRX6399273+SRX6399274+SRX6399275+SRX6399276+SRX6399277+SRX6399278+SRX6399279+SRX6399280+SRX6399281+SRX6399282+SRX6399283+SRX6399284+SRX6399285+SRX6399286+SRX6399287+SRX6399288+SRX6399289+SRX6399298+SRX6399299+SRX6399300+SRX6399301+SRX6399302+SRX6399303+SRX6399317+SRX6399328+SRX6399329+SRX6399330+SRX6399331+SRX6399332+SRX6399334+SRX6399335+SRX6399336+SRX6399337+SRX6399338+SRX6399339+SRX6399344+SRX6399360+SRX6399361+SRX6399362+SRX6399363+SRX6399364+SRX6399365+SRX6399366+SRX6399367+SRX6399369+SRX6399370+SRX6399371+SRX6399372+SRX6399373+SRX6399374+SRX6399375+SRX6399376+SRX6399377+SRX6399378+SRX6399380+SRX6399381+SRX6399393+SRX6399394+SRX6399395+SRX6399396+SRX6399397+SRX6399398+SRX6399399+SRX6399400+SRX6399401+SRX6399402+SRX6399403+SRX6399404+SRX6399405+SRX6399406+SRX6399407+SRX6399408+SRX6399409+SRX6399410+SRX6399411+SRX6399412+SRX6399424+SRX6399425+SRX6399426+SRX6399427+SRX6399428+SRX6399429+SRX6399430+SRX6399431+SRX6399432+SRX6399433+SRX6399445+SRX6399446+SRX6399447+SRX6399448+SRX6399449+SRX6399450+SRX6399451+SRX6399452+SRX6399453+SRX6399454&cmd=search&m=downloads&s=seq

Could you please help me identify whether these fasta files contain sequences that are already demultiplexed? if so, how can I identify the stage that these reads are at so I can plan the next step in data processing? thank you.

caseywdunn commented 3 years ago

SRA is arranged hierarchically, as explained here - https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/

I find it easiest to navigate from the top down.

The study entry for these data ( https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP197350 ) indicates that there are 1174 experiments and 1174 runs, so there is a 1:1 correspondence there.

The bioproject ( https://www.ncbi.nlm.nih.gov//bioproject/PRJNA542138 ) indicates 1174 experiments and 1180 samples. This to me suggests that that each experiment corresponds to a single sample, and a few samples were not sequenced.

Clicking through on the samples the can see that each sample is a single individual. So this suggests that each experiment is a dataset for a single individual.

The accessions that you downloaded are experiments, so based on the sleuthing above each should be for a simple which is an individual.

So these look like demultiplexed reads to me. Each file corresponds to a single sampled individual.

cc @lemellenthin

YaraAlshw commented 3 years ago

Thank you! this is very helpful.