Open Mustafa-Albekaa opened 4 years ago
Hi Mustafa,
Without looking, this sounds suspiciously like a change in the Biopython API. There has been two attributes which contain the same value Bio.Seq.name and Bio.Seq.id http://bio.seq.id/. It might be that .id has finally been dropped. Just a guess for now. There should be a fix with some version pinning to avoid this — if my suspicion is correct.
On that note, could you provide the output of pip freeze
?
On 25 Aug 2020, at 7:22 pm, Mustafa-Albekaa notifications@github.com wrote:
I've been having some trouble simulating HiC reads, and after an hour of troubleshooting I think I've identified the issue.
This is the command I've been running, and the error I've been running into.
sim3C --dist uniform -n 10000 -l 150 -e Sau3AI -m hic --profile-name ${genome}_simhic_profile.tsv $genome.fasta ${genome}_simhic.fastq
ERROR | 2020-08-25 02:09:05,237 | main | 'Seq' object has no attribute 'id' Traceback (most recent call last): File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/command_line.py", line 213, in main args.num_pairs, args.method, args.read_length, **kw_args) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/simulator.py", line 307, in init create_cids=create_cids, linear=linear) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 507, in init random_state, create_cids, linear)) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 82, in init self.sites = CutSites(enzyme, seq.seq, self.random_state, linear=linear) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/site_analysis.py", line 63, in init raise NoCutSitesException(template_seq.id, str(enzyme)) AttributeError: 'Seq' object has no attribute 'id' I believe the problem is that template_seq does not have an id method. Using type() on template_seq identifies it as a Bio.Seq.Seq object.
I've removed the sequences that were causing the issue and am now able to run the program, but this bug meant I was not able to easily identify which sequences did not have cut sites.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cerebis/sim3C/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN2PC5CLOPISFMOTWAYHZTSCN7GJANCNFSM4QKNWZRA.
Well ignore what I just said, seems this is entirely a bug in sim3C.
Hello Matthew,
I hope this will be fixed soon! Sim3C is very useful and been quite easy to use.
Output for pip freeze, in case you still need it, is:
biopython==1.76
BUSCO==3.1.0
certifi==2019.11.28
enum34==1.1.10
funcsigs==1.0.2
iced==0.4.2
intervaltree==3.0.2
llvmlite==0.31.0
numba==0.47.0
numpy==1.16.6
PyYAML==5.3.1
scipy==1.2.3
sim3C @ git+https://github.com/cerebis/sim3C@43e2ccfabf55f9ddb84754e9b29b8791d4bd34c0
singledispatch==3.4.0.3
six==1.15.0
sortedcontainers==2.2.2
tqdm==4.45.0
I have committed a fix to handle this issue (9830b3c0b0a4f50e90922c3cbf061dbb076d72a6).
Unfortuntely, this will perhaps not be the logic you are hoping to see. Reference sequences which do not contain a cut-site will be ignored in the simulation, and if a cell contains only that replicon, it too will be ignored.
Regarding how sim3C simulates Hi-C reads, a sequence which contains no cutsites will not produce a read-pairs with proximity ligations. It would however, still be capable of spurious read-pairs (noise). I will leave this issue open, but modify the title to reflect that this should be addressed in future.
I've been having some trouble simulating HiC reads, and after an hour of troubleshooting I think I've identified the issue.
This is the command I've been running, and the error I've been running into.
sim3C --dist uniform -n 10000 -l 150 -e Sau3AI -m hic --profile-name ${genome}_simhic_profile.tsv $genome.fasta ${genome}_simhic.fastq
I believe the problem is that
template_seq
does not have anid
method. Usingtype()
ontemplate_seq
identifies it as aBio.Seq.Seq
object.I've removed the sequences that were causing the issue and am now able to run the program, but this bug meant I was not able to easily identify which sequences did not have cut sites.