NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Create a architecture-based filter for an alignment file #41

Closed knaegle closed 6 months ago

knaegle commented 7 months ago

Is your feature request related to a problem? Please describe. We often want to look at features on a subset of proteins, but using the global alignment. Typically these are based on certain kinds of architectures.

Describe the solution you'd like A Jalview feature that takes as inputs: uniprot_reference_file, domain_architecture_list, global_alignment_fasta_file Writes a new fasta file keeping uniprot sequences that have a domain_architecture of interest that appears in the list. Suggest that you can either request the output file name or generate it based on global file name and append info about the filter step.

Describe alternatives you've considered Considered whether you filtered a fasta file based on the feature file, i.e. keeping only headers that appear in features, but we thought that would be less satisfactory for the cases when experimental data doesn't exist, but you want to see features on the family of domains of that architecture.

alekhyaa2 commented 6 months ago

Using uniprot_reference_file and global_alignment_fasta_file to filter domain specific fastafiles. Domain architecture list (is not an input to any of the functions defined but will be generated using the input uniprot_reference_file).

Created a function _jalviewFunctions.domain_specificfastafile that now outputs a fastafile with sequences of genes that contain the domain of interest. Also created another function _jalviewFunctions.domain_specificfeafile to filter features for the domain of interest and this can be used to load on the fasta file created using _jalviewFunctions.domain_specificfastafile.