PlantProteomes / SeqComparison

A project for comparing plant proteome sequences
Apache License 2.0
0 stars 2 forks source link

Extract all the lowest P00n for each Maize v5 protein #10

Open edeutsch opened 2 years ago

edeutsch commented 2 years ago

For the file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.2.fa

please generate a plain text file that is the lowest P00n number for each gene. So it would start:

Zm00001eb093920_P001
Zm00001eb033650_P001
...

They might all just be P001s. But maybe not, I don't know. Perhaps some begin with P0002

As the call, I talked about also a version with the longest isoform. But let's skip that for now. Let's just focus on lowest P00n number.

thanks!

MLi1104 commented 2 years ago

My solution:

My command line: image (program file name, fasta file name)

Results: image And isoforms should be sent to a text file in the same directory.

edeutsch commented 2 years ago

great, thanks, looks promising! But I don't think it has been pushed to the repo? I see this:

$ git pull
Already up to date.

deutsch@WALDORF G:\Repositories\GitHub\PlantProteomes\SeqComparison\scripts
$ python lowest_pvalue_ML.py ..\proteomes\maize\original\Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.2.fa
None

Please make sure that you have a txt file in the same directory as     the program

not what you posted above. Would you commit and push?

And then I was going to try running the above program to create gene-only FASTA files for B73_v4, B73_v5, and the W22 and then do a table comparison of those 6 files. Can we string together your programs do that?

MLi1104 commented 2 years ago

Thanks for the reminder; I just pushed.

I can work on making that comparison file now!

edeutsch commented 2 years ago

Great, thanks, I pulled the latest code and it generates the txt file nicely. So the next step is to alter it so that it writes a FASTA file (instead or in addition). And then once it can write FASTA files, then we should be able to do that comparison.

thanks!