cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

Add parameter to main in command.py #503

Closed afdhalrashid closed 1 year ago

afdhalrashid commented 1 year ago

I have many individuals which has ic number as unique identification each one of them. So I want to add into the main function parameter. How to add new parameter and how to execute the pangolin command in terminal with the new parameter?

Currently the pangolin command only receive one parameter which is file name.

Is it just adding into main function parameter and run pangolin filename ic_number.. something like this?

AngieHinrichs commented 1 year ago

To make sure I understand the question: do you have a fasta file that contains multiple sequences with different names, similar to this?

>ic101
...
>ic102
...
>ic103
...

-- and you want to run pangolin on only one sequence, like ic102 but not ic101 or ic103?

If that is what you are asking -- in my opinion, it is better to use a general-purpose program or toolkit for working with fasta files to extract only the sequence(s) that you want from the big fasta file into a small fasta file, and then run pangolin on the small fasta file. You can write a very small shell script or python script to do that. Learning to use a standard toolkit for manipulating fasta files will have many benefits and future uses, and modifying pangolin is complicated.

If I misunderstood what you are asking, then please provide a simple example.

afdhalrashid commented 1 year ago

I do not have the file contain multiple sequence with different names. Actually I want to upload fasta using PHP web app. This fasta relate with ic number(unique) in the mysql db (table A). Then this web app will execute that fasta and the result will be save in the db in table B. Table B will have column ic_number and all pangolin header result as columns.

Currently, I successfully alter pangolin code by adding insert into mysql db function. I add one argument in main function in command.py. This new argument is ic_number. So new command I successfully run is

_pangolin filename.fasta ic_number_

My current query is, do pangolin receive updates that changes the formula for analysis the fasta file?

AngieHinrichs commented 1 year ago

You have the skills to modify pangolin to do what you want, which is great! But if I were trying to do something similar, I would avoid modifying pangolin, because some day pangolin will be updated, and updating a modified pangolin is more complicated and difficult. Instead I would write a script that runs pangolin, reads the assignment from pangolin's output file (default name is lineage_report.csv), and makes a mysql command to update table B. This kind of "glue logic" to adapt the output of one tool to become the input of another tool is everywhere in bioinformatics and in programming in general.

do pangolin receive updates that changes the formula for analysis the fasta file?

Yes, in fact pangolin-data v1.18 was released only a few hours ago. You can update your pangolin installation to use the new pangolin-data without updating the pangolin software itself with this command:

pangolin --update-data

pangolin itself was updated to v4.2 eight days ago, with a speedup for the default (usher) analysis mode.

aineniamh commented 1 year ago

I'd recommend filtering the file prior to running pangolin (or filtering the output csv after running) rather than trying to add in custom parameters for this use case. I'm going to close this now as it seems @AngieHinrichs answered all the questions!