Russel88 / CRISPRCasTyper

CCTyper: Automatic detection and subtyping of CRISPR-Cas operons
https://typer.crispr.dk
MIT License
89 stars 16 forks source link

Add option to input GFF and protein FASTA, and output prot_ids #41

Closed pentamorfico closed 1 year ago

pentamorfico commented 1 year ago

Hi Jakob! Hope everything is going well!

I'm submitting a pull request to introduce an option for parsing both GFF and protein FASTA files. This option will be activated only when both file types are provided. At this stage, it has been tested exclusively with NCBI GFF files. Further validation is required to ensure compatibility with GFF files from Prodigal, Prokka, and Bakta. Currently, the protein accession ID is sourced from the "protein_id" field within the NCBI GFF attributes, and restricted to the CDS feature types.

Moreover, I've updated the system to output protein IDs in the Cas output tabs, a functionality that was missing previously.

Lastly, I've addressed some deprecation warnings from numpy, Biopython, and pkg_resources, with the latter now switched to importlib.metadata.

Hope you find it useful!

Russel88 commented 1 year ago

Hi Mario Thank you for implementing this! It looks good, but I will also do some testing before releasing it as a new version. Could you make a pull request to the dev branch? Then I can add some additional things before releasing

pentamorfico commented 1 year ago

Ok, I will do it right now! I also noticed an issue with the gff input