boutroslab / CRISPRAnalyzeR

CRISPRAnalyzeR: interactive analysis, annotation and documentation of pooled CRISPR screens
GNU General Public License v2.0
80 stars 33 forks source link

Input Files for GeCKO V2 #11

Closed DarioS closed 7 years ago

DarioS commented 7 years ago

The screen dataset I have has two FASTQ files for each sample; one for Library A and Library B. I'm not sure how characteristic this is of GeCKO datasets, because no one makes their raw data public in repositories like SRA. Anyway, assuming it's standard to sequence each sample separately for the two libraries, it seems that the web application expects these pairs of files to be concatenated before they're uploaded to the server. The intended audience is biologists and I don't think many of them could use cat on the pairs of fastq.gz files and redirect stdout to a gzip file at the command line correctly. Concatenating may not be a good idea if one file has a quality problem but the other doesn't, so it'd be best if they could be kept separate. Might the Upload Your Data section handle this easily?

jwinter6 commented 7 years ago

hi Dario, I will also make individual FASTA files available for this purpose.

In the meantime you can just use the whole library FASTA and then use either A or B sequencing files. Just activate "remove low sgRNA readcounts" and set this to >= 0. This will automatically remove the missing sgRNAs from the analysis - in your case the missing library part. By this you can use the full gecko FASTA file, but get the results tailored to your sequencing dataset.

Best Jan

jwinter6 commented 7 years ago

added individual files in e531f7161a495f01ced178069b0673a5a3525f63