boutroslab / CRISPRAnalyzeR

CRISPRAnalyzeR: interactive analysis, annotation and documentation of pooled CRISPR screens
GNU General Public License v2.0
80 stars 33 forks source link

Looking for the solution to several problems: Disconnected from the server and abnormal counting results. #38

Open Beardancer opened 6 years ago

Beardancer commented 6 years ago

I am using Version 1.50. During my using, I think I need the solution to several problems. Thanks a lot to whoever can help me. The first problem is that sometimes when I upload my NGS Sequencing Files (.fastq.gz), it will disconnected from the server, then I need to reload and lost the uploaded files. That's really annoying. Here is the second problem. I choose GeckoV2 A+B library, and if I use default regular expression like "CACC(.{20})", the result shows like below:

sgRNA Extraction Ratio: 100% Not aligned: 99.02% Aligned once: 0.98% Aligned multiple times: 0.00%

and the counting result is also terrible that nearly all of them are 0, only a few of them are 1 or 2 or 3 in the read count file.

If I use regular expression like "CACCG(.{20})" as DarioS said from https://github.com/boutroslab/CRISPRAnalyzeR/issues/26 , then the result for the same sequencing file shows like below:

sgRNA Extraction Ratio: 97.39% Not aligned: 30.87% Aligned once: 68.12% Aligned multiple times: 1.01%

This result is quite close to the Mageck analysis result. But the counting result in the read count file is still terrible as before. The counting result for the same sgRNA could be hundreds even thousands in Mageck count file. Since I could not get the correct counting result, I can't do the further analysis.

Here is the sample of my sequencing data from my sequencing files.

@NS500762:128:HFLVFBGX5:1:11101:2102:1044 1:N:0:ATCGTGCT GTAGTNCTTGTGGAAAGGACGAAACACCGCCTGCACTCGGAGAAGAACGGTTTTAGAGCTAGAAATAGCAAGTTAA + AAAAA#E/AEEEEEAEEEEEEEEEE/EEE<EEEEEE//EEEE/EEEEEEEEEEEEE6AEEEAEEEEEEEEEEEEEA @NS500762:128:HFLVFBGX5:1:11101:4063:1044 1:N:0:ATCGTGCT AGCTGNCTTGTGGAAAGGACGAAACACCGCAAGTTACCCCACGAGTCCTGTTTTAGAGCTAGAAATAGCAAGTTAA + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

The structure is "CACCG+(20bp sgRNA)+GTTT".

It would be very kind that someone could help me get the correct read count file. @jwinter6

Thank you very much.

jwinter6 commented 6 years ago

Hi, thanks for your post.

You are right, I have unfortunately overwritten the default value for GeckoV2 in regards to the regular expression, I will change this in a fixed version.

Please do me a favor:

When you uploaded the FASTQ data, please switch the "Fast FASTQ processing" to OFF, as we have experienced an issue in the latest version of this tool for some FASTQ data. This should eliminate the low read counts.

Please give it a try and let me know if this was the cause. Anyway, the GeckoV2 library really causes a lot of issues, but I will do my best to assist you :)

Cheers Jan

Beardancer commented 6 years ago

Hi, Jan. After I switched the "Fast FASTQ processing" to OFF, it worked. Thank you very much. Here is another small question. After the analysis, there is a page that could show the significant candidates across all methods in "Hit calling"-"Overview". But after I generated the report, I could not find this page in the report. I think this page is very important for me to show the result to someone else. How could I also add this page into report files? Thank you very much again for this very useful analysis tool, and looking forward for your reply.

Best wishes

mirabai-cuenca commented 5 years ago

I had the same problem using a custom library and switching "Fast FASTQ processing" to OFF did not work, I also switched to OFF the optimization of the sgRNA library and then got a little bit higher counts but not what it should be (always lower to 50 and it should go over 900). What can I do?

Thanks in advance