CDCgov / MicrobeTrace

The Visualization Multitool for Molecular Epidemiology and Bioinformatics
https://microbetrace.cdc.gov/
Apache License 2.0
85 stars 38 forks source link

Import MEGA Files #47

Open AABoyles opened 5 years ago

AABoyles commented 5 years ago

Background MEGA is one of the most ubiquitous biological sequence viewing, editing and alignment tools. Accepting MEGA output files would broaden MicrobeTrace's user base to include academics and international scholars. Additionally, by accepting MEGA output, users can leverage a more robust multiple sequence alignment to include more complex sequence data.

Open Dask Description MicrobeTrace ingests FASTA files for sequence data. A similar file format is the MEGA Sequential data format. It would be nice for MicrobeTrace to be able to import MEGA Sequential data files.

To accomplish this:

  1. Take a look at the app.parseFASTA function (located in the scripts/common.js file. Copy it to app.parseMEGA and modify the copy to look for # instead of >. Also, devise a check for the MEGA Header (FASTA files don't have headers).
  2. In components/files.html, once the user clicks "submit", add a check for whether a file is a MEGA file or not. If it is, do exactly the same thing you would have with a FASTA file, only use app.parseMEGA instead of app.parseFASTA.
GrooveCS commented 3 years ago

Is this still an Open Issue?

AABoyles commented 3 years ago

Yes it is! PR #184 resolved the first half of the issue (creating the MEGA parser), but didn't address the second (integrating it into the application logic).

GrooveCS commented 3 years ago

Are you open if we can provide support?

ells commented 3 years ago

@GrooveCS Thank you for asking, we welcome open source contributions on all of our issues! Feel free to interact with us here or via email at microbetrace@cdc.gov.

AABoyles commented 3 years ago

@GrooveCS Agreed, please do! You'll make our day ;)