boutroslab / CRISPRAnalyzeR

CRISPRAnalyzeR: interactive analysis, annotation and documentation of pooled CRISPR screens
GNU General Public License v2.0
80 stars 33 forks source link

Uploaded fastq.gz files are seemed to be recognized as readcount file #14

Closed fzg26493 closed 7 years ago

fzg26493 commented 7 years ago

I uploaded several fastq.gz files and proceed to Check Files. Several minutes later, I got following message.

Content of extracted and mapped aa.fastq.gz does not look like a readcount file. For further assistence please briefly describe your problem on the CRISPRAnalyzeR Github page in the issue section @ http://github.com/boutroslab/CRISPRAnalyzeR.

Sizes of fastq.gz files are around 500 MB each. Any suggestions?

benediktrauscher commented 7 years ago

Hi, any chance that your fastq.gz files do not end with '.fastq.gz? CRISPRanalyzeR recognizes the type of a file by its ending. If the uploaded file ends with anything else but '.fastq.gz' it assumes that a read count file was uploaded.

jwinter6 commented 7 years ago

Hi,

Sorry for the slighty misleading error message. CRISPRAnalyzeR basically tells you that is extracted and mapped your FASTQ file, but at the end the read count files does not meet the criteria, which is sgRNA and Reads separated by a tab.

This can have different reasons, e.g. an issue with the FASTA library file.

Could you please tell me:

This information could help me a lot.

In case you used the docker version, could you please give the BETA a try that you can run via (simplified)

docker run -p 80:3838 boutroslab/crispranalyzer:1.11BETA

I will also include a more detailed error handling the next days.

Best and thanks for you help Jan

fzg26493 commented 7 years ago

Thanks for comments. Firstly, the name of uploaded file ends with '.fastq.gz'. I used self-made fasta file that is for mouse GeCKO library (A+B). Fastq extraction setting was 'CG(.20})G'. sgRNA identifier was official gene symbol. I used docker version. I also tried 1.11BETA, but I got the same result. Thanks for your help.

jwinter6 commented 7 years ago

Hi,

I will prepare some fasta files for the mouse geckov2 using my toolchain tomorrow, and I would be happy if you can give it a try.

It's just a gut feeling :)

Thanks Jan

Am 9. M?rz 2017 21:51:55 schrieb fzg26493 notifications@github.com:

Thanks for comments. Firstly, the name of uploaded file ends with '.fastq.gz'. I used self-made fasta file that is for mouse GeCKO library (A+B). Fastq extraction setting was 'CG(.20})G'. sgRNA identifier was official gene symbol. I used docker version. I also tried 1.11BETA, but I got the same result. Thanks for your help.

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/boutroslab/CRISPRAnalyzeR/issues/14#issuecomment-285479247, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG3q9EvzVthCDCu6flk9HS5JNs2yeU-Mks5rkGZmgaJpZM4MXiB0.

jwinter6 commented 7 years ago

Hi,

could you give these a try

3e615a8bc0e0d697df66345cc05590fc02fd3521

Please select mouse and MGI SYMBOL as identifier.

Moreover I will update the 1.11BETA in a few minutes, please try it using this beta and let me know if it works now for you :)

Best Jan

fzg26493 commented 7 years ago

It works! Thanks a lot, Jan!

jwinter6 commented 7 years ago

Hi,

great to hear that it works for you. In general I will implement some additional stuff to automatically detect and eliminate potential issues with fasta files. As far as I experienced, data files may have issues that derived from "excel" usage , e.g. weird names (like dates), whitespaces, linebreaks etc.

I added some files to fasta/dev e.g. for gecko how I generated the FASTA files. Moreover I added a checkFasta.Rmd file to check FASTA files for correct entries (uniqueness, unwanted characters). You can find it in /helpers.

Glad to see it works now!

Best Jan