Russel88 / CRISPRCasTyper

CCTyper: Automatic detection and subtyping of CRISPR-Cas operons
https://typer.crispr.dk
MIT License
94 stars 17 forks source link

CCTyper results #43

Closed elozanoe closed 1 year ago

elozanoe commented 1 year ago

Hello! I've been using CCTyper and I'm very happy with the performance and results, but I have a question about the results of an isolate assembly. This is the CRISPR_Cas.tab file:

Contig Operon Operon_Pos Prediction CRISPRs Distances Prediction_Cas Prediction_CRISPRs Contig51 Contig51@1 [785, 2608] I-B ['Contig51_1'] [177] Ambiguous ['I-B']

However, when I go to the cas_operons.tab file, I find:

Contig Operon Start End Prediction Complete_Interference Complete_Adaptation Best_type Best_score Genes Positions E-values CoverageSeq CoverageHMM Strand_Interference Strand_Adaptation Contig137 Contig137@1 40 5517 I-B 100% 0% I-B 15.0 ['Cas6_0_CAS-III-B-I-B', 'Cas8b1_10_CAS-I-B', 'Cas7_0_CAS-I-B', 'Cas5_0_IB', 'Cas3_0_I'] [1, 2, 3, 4, 5] ['2.00e-53', '2.50e-250', '2.40e-48', '1.40e-19', '1.40e-34'] [0.979, 0.983, 0.935, 0.832, 0.545] [0.979, 0.991, 0.967, 0.862, 0.475] 1 NA

This file shows the same results as the cas_operons_orphan.tab file. My question is why it seems to identify a complete CRISPR/Cas locus when it doesn't find the Cas operon. Interestingly, using Bakta to annotate this asolate, it identifies the Cas genes very close to the CRISPR array (less than 5000 nucleotides away and on the same contig). It seems that CCTyper recognizes that the CRISPR/Cas system is complete, but it doesn't display this operon. What could be happening? Thanks in advance!

Russel88 commented 1 year ago

Hi

My guess is that you have a cas operon in "cas_operons_putative.tab" on Contig51 with no definite subtype, which is close to a CRISPR array. Maybe it's cas1 and cas2, which seems missing from the I-B cas operon on Contig137. I can see that the output is confusing in this case

elozanoe commented 1 year ago

These are the results for contig 51 on cas_operons_putative.tab file:

Contig Operon Start End Prediction Complete_Interference Complete_Adaptation Best_type Best_score Genes Positions E-values CoverageSeq CoverageHMM Strand_Interference Strand_Adaptation Contig51 Contig51@1 785 2608 Ambiguous ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'] ['100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', '100%', 'NA'] ['I-A', 'I-B', 'I-C', 'I-D', 'I-G', 'II-B', 'V-A', 'V-B1', 'V-B2', 'V-E', 'V-F1', 'V-F2', 'V-F3', 'V-F'] 6.0 ['Cas4_5_CAS-I-II-III-IV-V-VI', 'Cas1_6_CAS-I-II-III-IV-V-VI', 'Cas2_2_CAS-I-II-III-IV-V-VI'] [2, 3, 4] ['1.30e-49', '1.90e-120', '8.40e-24'] [0.976, 0.964, 0.789] [1.0, 0.994, 0.975] NA 1

It seems a bit ambiguous, but I show the results in case the same thing happens to another user

Russel88 commented 1 year ago

Yeah, so you have an adaptation module (cas1, cas2, cas4) next to a CRISPR array on contig51, and the best guess cctyper can give is I-B due to the CRISPR repeat sequence. And then you have a I-B interference module on contig137

elozanoe commented 1 year ago

Thanks!