Open NathanSiemers opened 3 years ago
I tested removal of the import calls in init.py and one other file, and tracer loaded correctly, but haven't made a test run.
Hi Nathan,
Thanks for this! I'll be happy to accept a PR that updates this if you'd like to submit one.
All the best,
Mike
I've spent several days working on a pull request. I removed the Bio Alphabet dependencies and changed the creating of the Seq objects to remove dependencies on Bio Alphabet IUPAC. I have also have been editing the Dockerfile to update packages to bring everything to a modern version, and also to run the tests. I can send you what I have so far, but: There's an error in the 'tracer test'. It seems that there's still an obscure call to Bio Alphabet in the pickle dump/load that I find difficult to trace. Partially likely because I'm not a python hacker, I can't resolve this one. Some help from the group would be appreciated.
(fragment of tracer test below, I can't find a remaining reference to Bio Alphabet anywhere in the code base.)
[build] loading fasta file /tracer/test_data/results/cell1/expression_quantification/kallisto_index/cell1_transcriptome.fa [build] k-mer length: 31 [build] warning: clipped off poly-A tail (longer than 10) from 654 target sequences [build] warning: replaced 3 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... done. [build] building target de Bruijn graph ... done [build] creating equivalence classes ... done [build] target de Bruijn graph has 781463 contigs and contains 113560426 k-mers
[quant] fragment length distribution will be estimated from the data [index] k-mer length: 31 [index] number of targets: 131,104 [index] number of k-mers: 113,560,426 [index] number of equivalence classes: 460,618 [quant] running in paired-end mode [quant] will process pair 1: /tracer/test_data/cell1_1.fastq /tracer/test_data/cell1_2.fastq [quant] finding pseudoalignments for the reads ... done [quant] processed 1,135 reads, 1,042 reads pseudoaligned [quant] estimated average fragment length: 106.333 [ em] quantifying the abundances ... done [ em] the Expectation-Maximization algorithm ran for 52 rounds
Traceback (most recent call last):
File "/usr/local/bin/tracer", line 11, in molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type
as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
I think the untraceability of the error is due to the Bio Alphabet embedding in the pkl test data reference files in directories like this:
https://github.com/Teichlab/tracer/tree/master/test_data/results/cell2/unfiltered_TCR_seqs
If that's true then the error is due to modern python not being able to load the old reference test results that were pickled.
N
(some text strings from the pkl file below)
S'alphabet'p154g0(cBio.AlphabetHasStopCodonp155g2Ntp156Rp157(dp158S'stop_symbol'p159S'*'p160sg154g0(cBio.Alphabet.IUPACExtendedIUPACProteinp161g2Ntp162Rp163sS'letters'
Thanks Nathan.
Yes, I think you're right that the error comes from test
trying to load the old pickled files that were created with a previous version.
I think that a solution here would be to use an environment with the old BioPython to load those pickled files and then write them out as some kind of parseable text file (not as a pickle).
The pickles are representations of a Cell
(https://github.com/Teichlab/tracer/blob/84f53e5ae0211822580be53841fc097fa8694419/tracerlib/core.py#L10) object and its Recombinant
(https://github.com/Teichlab/tracer/blob/84f53e5ae0211822580be53841fc097fa8694419/tracerlib/core.py#L298) objects.
These classes aren't very complex so you could write out a text file containing their instance variables.
You could then switch to an environment with the new version of BioPython, recreate the objects using the values in your text file and then repickle them. Those should then be compatible and test
should pass.
Cheers,
Mike
Hello, I'm trying to build a running tracer on a more modern version of python (3.8.10). SInce then, Bio.Alphabet has been removed from python, and the recommendation is that calls to it (IUPAC) can be removed from most code without a problem.
Is it feasible to do this? Any know successes or issues with later versions of python?
Thank you.
File "/usr/local/lib/python3.8/site-packages/tracer-0.5-py3.8.egg/tracerlib/tracer_func.py", line 29, in
from Bio.Alphabet import IUPAC
File "/usr/local/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the \
`molecule_type