Closed wt12318 closed 3 years ago
Just as a warning, we never tested netMHCpan4.1 with NeoRecoPo, and since your netMHCpan version is used for predicting wild-type binding in NeoRecoPo, the format difference might cause problems in the processing.
That being said, it looks like the error you're getting is before this step. I think the issue is probably coming from your test.txt not being in the same format as NeoRecoPo would expect the output of the previous step (NeoPredPIpe) to be in. For example in your first line, I'm not sure what the last few values stand for following the Identity column - is the 0 in the end meaning the neoantigen is not novel, or is that the 1 before?: line462_NM_0153 0.003614 14569.59 8.548 1 0 NeoRecoPo would filter this out as not novel, but I'm not sure if that's what you'd intend. Your 7th line on the other hand looks more like how I'd expect the table: line462_NM_0153 0.120614 308.75 1.019 <= WB 1 If this 0 in the first line was introduced by some accident, I would suggest to first filter for only WB and SB neoantigens (either manually, or by not using the -a option in NeoPredPipe) before NeoRecoPo, that should ensure all rows have the same number of elements making it easier to bring the table to netMHCpan4.0 format.
Also, based on the beginning of your lines: TCGA-02-0047-01A 0 line462 it seems to me you've run NeoPredPipe with the -c option used in multi-region vcf files, even though TCGA files are usually single region. I don't think this will produce an error, but it would be more appropriate to not use the -c option.
Thank you for your reply. I tried to filter for only WB and SB neoantigens and remove the second column: test.txt
and tried:
NeoRecoPo.py -d --neopred_in=test.txt --neoreco_out=./ --fastas=./test
NeoRecoPo.py -d --neopred_in=test.txt --neoreco_out=./ --fastas=./test
INFO: Begin.
Traceback (most recent call last):
File "/public/slst/home/wutao2/software/NeoPredPipe/NeoRecoPo.py", line 135, in <module>
main()
File "/public/slst/home/wutao2/software/NeoPredPipe/NeoRecoPo.py", line 98, in main
preds.ConstructWTFastas()
File "/slst/home/wutao2/software/NeoPredPipe/StandardPredsClass.py", line 195, in ConstructWTFastas
self.__addToFastaFile()
File "/slst/home/wutao2/software/NeoPredPipe/StandardPredsClass.py", line 169, in __addToFastaFile
seqID, seq = self.__extractSeq(sam, fasta_head, epitopeLength) # WT seqID and seq
File "/slst/home/wutao2/software/NeoPredPipe/StandardPredsClass.py", line 269, in __extractSeq
WTepiSeq = ExtractSeq(WT[1], pos, epitopeLength)
IndexError: list index out of range
also the same problem.
And I look at the code, it seems that the seq_recode
in this sample Neoantigen predicted file (like line462;NM_0153
) are not in the fasta
files:
>>> for seq_record in SeqIO.parse(preds.fastas[sample], 'fasta'):
... seqIdentifier = ';'.join(seq_record.id.split(';',3)[0:2])[0:16]
... if identifier in seqIdentifier:
... print("ok")
... else:
... print("no ok")
...
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
no ok
So the WT
list is empty, and the operation WT[1]
produced IndexError
in the __extractSeq
function:
def __extractSeq(self, sample, identifier, epitopeLength):
'''
Extracts the sequence from the *.tmp.epi.fasta file and reverts the sequence back.
:return: the wildtype sequence and header
'''
WT = []
Mut = []
count=0
for seq_record in SeqIO.parse(self.fastas[sample], 'fasta'):
seqIdentifier = ';'.join(seq_record.id.split(';',3)[0:2])[0:16]
if identifier in seqIdentifier:
count += 1
if 'WILDTYPE' in seq_record.id.split(';')[2]:
WT.append(seq_record.id)
WT.append(seq_record)
else:
try:
pos = int(seq_record.id.replace(";;", ";").split(";")[5]) - 1
except ValueError:
pos = int(seq_record.id.replace(";;", ";").split(";")[6]) - 1
Mut.append(seq_record.id)
Mut.append(seq_record)
if count==2:
break
WTepiSeq = ExtractSeq(WT[1], pos, epitopeLength)
return(WT[0], WTepiSeq)
But I don't know how to solve this problem.
Thank you.
Hi! I think your input (test.txt) looks good now, but indeed there seems to be something fishy going on regarding the fasta file and the neoantigen table. I haven't seen this before, based on the fasta file you've shared, you shouldn't have a neoantigen with ID (V19) line462_NM_0153 in the neoantigen table - it should be line462_NM_0181. I don't see how this could have happened, as in the NeoPredPipe step, the pipeline uses the fasta file to supply a name to netMHCpan (e.g. line462;NM_0153 )and that's the name put into column V19. So there shouldn't be a discrepancy between what is present in the fasta and what is found in the neoantigen table, regardless of netMHCpan version. Is it possible that you had several runs of NeoPredPipe, and the fasta files are from a slightly different run than the neoantigen table? (It can be up to random chance which out of two transcript identifiers (NM_18156 and NM_015378) is listed first and used in the ID, so two separate runs can differ in that and cause problems)
I would suggest making a new, clean run of NeoPredPipe (after deleting temporary files, like fastas, avannotated, etc. or in a separate folder). Then before running NeoRecoPo, first have a quick look by eye if the fasta headers and the neoantigen table IDs are the same - I don't see a reason why they wouldn't be.
Hi, Sorry to bother again. I used following code to perform neoantigen recognition potential predictions:
But some errors had happened:
And the
tmp
files are empty:The following are my input files:
test.txt The
./test
dir has the twofasta
files: TCGA-02-0047-01A.fasta.txt TCGA-02-0047-01A.reformat.fasta.txtThank you.