Closed multimeric closed 3 years ago
This function makes me think that each genbank file is a cluster, and each sequence within that file is a gene: https://github.com/gamcil/clinker/blob/10c7a1da6a30cf7b93d28f67474c9a34f098b61d/clinker/classes.py#L201
Hey,
The examples
folder has some good examples of valid input. Hope that helps!
Great, thanks! And just to clarify, I assume that means that the genbank files have to be nucleotide sequences, not protein sequences (the GenPept format)? The reason I ask these questions that don't normally matter is because I'm not using genbank itself as the source of my annotations. Rather I'm having to create these files from another data source, which is why I'm after the specifics.
Yeah you're right, nucleotide GenBank, not GenPept, since we want to display the genomic region the genes are on. Though, you don't necessarily need nucleotide sequence if your CDS sequence features all have protein translations in them, as long as the overall GenBank format is still valid (I think some other pipelines generate files like that).
Hi, I'm trying to integrate clinker into a pipeline I'm working on. However I'm confused about a few things: