Russel88 / CRISPRCasTyper

CCTyper: Automatic detection and subtyping of CRISPR-Cas operons
https://typer.crispr.dk
MIT License
89 stars 16 forks source link

XGBoost model incompatible #46

Closed Matt-BF closed 6 months ago

Matt-BF commented 6 months ago

Hi! Great program, has been helping me a lot! I've been running cctyper for metagenomes and it has been working for the most part. For one of my FASTA files it has been erroring with XGBoost model incompatible. I am using cctyper v 1.8.0 via mamba on a fresh environment, created as instructed on the README

Thanks for any help in this situation!

cctyper part_001.fasta part_001_results --prodigal meta
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/bin/cctyper:7: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/
en/latest/pkg_resources.html
  import pkg_resources
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
[2024-02-14 06:27:58] INFO: Running CRISPRCasTyper version 1.8.0
[2024-02-14 06:28:01] INFO: Predicting ORFs with prodigal
[2024-02-14 07:20:09] INFO: Running HMMER against Cas profiles
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 705/705 [54:41<00:00,  4.65s/it]
[2024-02-14 08:17:52] INFO: Subtyping putative operons
[2024-02-14 08:18:08] INFO: Predicting CRISPR arrays with minced
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/clusterfs/jgi/groups/science/homes/mbfiamenghi/.micromamba/envs/cctyper/lib/python3.8/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been dep
recated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopyth
on developers if you still need the Bio.pairwise2 module.
  warnings.warn(
[2024-02-14 08:18:50] INFO: BLASTing for CRISPR near cas operons
[2024-02-14 08:21:23] INFO: Predicting subtype of CRISPR repeats
[2024-02-14 08:21:23] ERROR: XGBoost model incompatible
Russel88 commented 6 months ago

It's probably because of the XGboost version. I would fix the py-xgboost to version 1.7.1 and probably also Biopython to 1.76 to avoid those warnings.

Looks like I have to update the version requirements

Matt-BF commented 6 months ago

It's probably because of the XGboost version. I would fix the py-xgboost to version 1.7.1 and probably also Biopython to 1.76 to avoid those warnings.

Looks like I have to update the version requirements

Hi, thanks for the quick response! I downgraded py-xgboost to 1.7.1 in my env and Biopython 1.76, but still get the same incompatible model error (albeit without the deprecation warnings).

It's strange because I have 528 of these FASTA, all apparently ran successfully, and only one gave me this error

Russel88 commented 6 months ago

That's strange. It could be weird characters in your fasta headers for that single fasta, or maybe lower case nucleotides. Is any output produced at all?

Matt-BF commented 6 months ago

Yes, from what I can see I have most files, except Crispr_Cas.tab, and the crisprs_all.tab does not have the prediction, subtype or subtype probability columns. Everything else seem the same

Russel88 commented 6 months ago

Can you post the crisps_all.tab file here?

Matt-BF commented 6 months ago

Had to rename it to .txt so github would allow me to post it crisprs_all.txt

Russel88 commented 6 months ago

Aha, it's probably the following repeat that's the problem: GGGGGGGGGGGGGGGGGGNNNNN - which also looks super weird. I don't think the repeat classifier can handle N's, usually CRISPRs do not span across N's.

A quick fix for you would be to remove that contig from the fasta file.

Matt-BF commented 6 months ago

Yeah, removing the sequence from the original FASTA file solved it! Thanks for your help debugging!