gamcil / clinker

Gene cluster comparison figure generator
MIT License
507 stars 66 forks source link

Elaborate on the input file contents #65

Closed multimeric closed 3 years ago

multimeric commented 3 years ago

Hi, I'm trying to integrate clinker into a pipeline I'm working on. However I'm confused about a few things:

multimeric commented 3 years ago

This function makes me think that each genbank file is a cluster, and each sequence within that file is a gene: https://github.com/gamcil/clinker/blob/10c7a1da6a30cf7b93d28f67474c9a34f098b61d/clinker/classes.py#L201

gamcil commented 3 years ago

Hey,

The examples folder has some good examples of valid input. Hope that helps!

multimeric commented 3 years ago

Great, thanks! And just to clarify, I assume that means that the genbank files have to be nucleotide sequences, not protein sequences (the GenPept format)? The reason I ask these questions that don't normally matter is because I'm not using genbank itself as the source of my annotations. Rather I'm having to create these files from another data source, which is why I'm after the specifics.

gamcil commented 3 years ago

Yeah you're right, nucleotide GenBank, not GenPept, since we want to display the genomic region the genes are on. Though, you don't necessarily need nucleotide sequence if your CDS sequence features all have protein translations in them, as long as the overall GenBank format is still valid (I think some other pipelines generate files like that).