Open lzhangUT opened 3 years ago
however, if I manually copy all the content and create the file and save into my working directory, the files seem to be working, the error was gone.
but I have another issue when running the following code:
with pyhmmer.easel.SequenceFile("/dbfs/mnt/alphafold/LuxC.faa") as seq_file:
seq_file.set_digital(alphabet)
sequences = list(seq_file)
pipeline = pyhmmer.plan7.Pipeline(alphabet, background=background)
hits = pipeline.search_hmm(query=hmm, sequences=sequences)
ValueError: Could not parse file: Line 2: illegal character -
Hi @lzhangUT ,
In the first snippet, I am not sure what is going wrong, but you can always manually set the file type to "stockholm"
since it looks like Easel doesn't find the format properly:
with pyhmmer.easel.MSAFile("/dbfs/mnt/LuxC.sto", format="stockholm") as msa_file:
msa_file.set_digital(alphabet)
msa = next(msa_file)
In the second one, I suppose it's because you are trying to read a multiple alignment file, and by default using a SequenceFile
on those will fail. You need to manually allow the gaps:
with pyhmmer.easel.SequenceFile("/dbfs/mnt/alphafold/LuxC.faa", ignore_gaps=True) as seq_file:
seq_file.set_digital(alphabet)
sequences = list(seq_file)
Hi @althonos , Thanks for your response. first of all, I think LuxC.faa is a fasta file, i.e.,a sequence file, not a multiple alignment file here. second, I was following the tutorial on your github, and the data is from your github as well. Even after I add the code 'ignore_gaps=True', the same error is still there.
with pyhmmer.easel.SequenceFile("/dbfs/mnt/alphafold/LuxC.faa", ignore_gaps=True) as seq_file: seq_file.set_digital(alphabet) sequences = list(seq_file)
ValueError: Could not parse file: Line 2: illegal character - and the error is for the line in **,
Hi, I was following your tutorial of Multiple sequence alignment (mas) to HMM. I have downloaded your example data into my working directory. and I can see the two files (LuxC.faa and LuxC.sto) there as this: [FileInfo(path='dbfs:/mnt/LuxC.faa', name='LuxC.faa', size=153510), FileInfo(path='dbfs:/mnt/LuxC.sto', name='LuxC.sto', size=150686),
when I tried to run this code:
It gives me error like this: ValueError: Could not determine format of file: '/dbfs/mnt/LuxC.sto'
I am not sure where it went wrong, the installation and the first two commands in the tutorial works fine. Thanks for your help