PureseqTM / PureseqTM_Package

Stand-alone software package of PureseqTM for transmembrane topology prediction from amino acid sequence only.
http://pureseqtm.predmp.com
4 stars 2 forks source link

Unexpected: filenames instead of gene names are used #7

Open richelbilderbeek opened 4 years ago

richelbilderbeek commented 4 years ago

Dear PureseqTM maintainer.

I would to report some unexpected behavior.

For this example I will be using example/1bhaA.fasta. If I put it on screen ...

cat example/1bhaA.fasta 

I see that the FASTA file contains one protein named 1bhaA:

>1bhaA
QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF

Now I run PureseqTM.sh from a local clone of this repo on that example file ...

./PureseqTM.sh -i example/1bhaA.fasta

... when taking a look at the output ...

cat 1bhaA_PureTM/1bhaA.top

I get exactly as I expected:

>1bhaA
QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF
00000000001111111111111111110000000000000001111111111111111111100000000

What I expect is that the first line holds the gene/protein name. In this case, this is correctly set to 1bhaA.

Now I copy the example file:

cp example/1bhaA.fasta example/report.fasta

Run it again:

./PureseqTM.sh -i example/report.fasta

... when taking a look at the output ...

cat report_PureTM/report.top

... I get something unexpected:

>report
QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF
00000000001111111111111111110000000000000001111111111111111111100000000

What is unexpected, is that the protein name suddenly changed to report (the filename) instead of its original name.

When using PureseqTM_proteome.sh instead, the behavior (albeit slightly different) is as expected.