Closed LiShuhang-gif closed 1 year ago
I've tried the --reads_fasta
option with a compressed fastq file, which leads to another error as follows:
zz 3103
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/src/ins.py", line 149, in examine_regions
ins_list.extend(self.examine_region(region, bam=bam, reads_fasta=reads_fasta))
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/src/ins.py", line 183, in examine_region
ins_list.extend(self.extract_ins(aln, region, reads_fasta=reads_fasta))
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/src/ins.py", line 360, in extract_ins
ins[7] = INS.extract_neighbour_seqs(self.get_seq(reads_fasta, aln.query_name, aln.is_reverse), rpos, len(ins_seq), self.w)
TypeError: object of type 'NoneType' has no len()
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/fan_lab/shali/yes/bin/straglr.py", line 77, in <module>
main()
File "/public/home/fan_lab/shali/yes/bin/straglr.py", line 62, in main
ins = ins_finder.find_ins()
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/src/ins.py", line 80, in find_ins
batched_results = parallel_process(self.examine_regions, batches, self.nprocs)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/src/utils.py", line 20, in parallel_process
results = p.map(func, args)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pathos/multiprocessing.py", line 139, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/multiprocess/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/multiprocess/pool.py", line 657, in get
raise self._value
TypeError: object of type 'NoneType' has no len()
HI @LiShuhang-gif
Thanks for trying Straglr. Would you mind re-cloning the current Straglr repo to see if you get any results?
There is a new option --tmpdir
where you can specify the tmpdir location. From the error messages it seems like your tmp space is used up. You need to find a location big enough so that temporary files can be generated by Straglr, and specify the location using --tmpdir
.
The --reads_fasta
has been tested, as long as the fastq sequences has been indexed by tabix
it should be accessible by pysam
. But they have to be bgzipped
to be indexable by tabix
.
Hello, thanks for your reply. But, as far as I know, it seems that tabix can't be used to index fastq files since there is no fastq
in tabix -p
, which is used to specify the file type.
tabix: option requires an argument -- 'p'
Program: tabix (TAB-delimited file InderXer)
Version: 0.2.5 (r1005)
Usage: tabix <in.tab.bgz> [region1 [region2 [...]]]
Options: -p STR preset: gff, bed, sam, vcf, psltbl [gff]
Can you show me how you handle fastq files, preferably with a specific Linux command line? Thank you very much!
sorry it should be samtools
not tabix
, samtools faidx
or samtools fqidx
Hello, I have already specified the path for tmp files using --tmpdir
option, used bgzip
to compress and used samtools fqidx
to index my fastq files. However, it seems like that a new error message has turned out as follows:
Traceback (most recent call last):
File "/public/home/fan_lab/shali/yes/bin/straglr.py", line 4, in <module>
__import__('pkg_resources').run_script('straglr==1.2.0', 'straglr.py')
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pkg_resources/__init__.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1455, in run_script
exec(script_code, namespace, namespace)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/straglr-1.2.0-py3.7.egg/EGG-INFO/scripts/straglr.py", line 80, in <module>
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/straglr-1.2.0-py3.7.egg/EGG-INFO/scripts/straglr.py", line 50, in main
TypeError: __init__() got an unexpected keyword argument 'min_cluster_size'
The script I used is as follows:
straglr.py C1.sort.filter.bam hg38_22_XYM.fa straglr_scan_min_ins20.tsv \
--min_ins_size 20 \
--genotype_in_size \
--min_support 2 \
--nprocs 16 \
--tmpdir /public/home/fan_lab/shali/VNTR/Straglr/C1_ins20/tmp \
--reads_fasta ../combined_C1.fq.gz
According to the error message, it seems that there is something wrong with min_cluster_size
, but I didn't set this parameter in my script.
Any suggestion about solving this error? I'll try anything you suggest right away. Thanks again!
seems like you are not running the latest code (in the src
directory), because min_cluster_size
is a newly-added parameter and the error message said it's not recognized.
Why don't you test running the little test data I've put up in the test directory and see you can get the expected output (genome_scan.*
). The command is simply:
straglr.py test.bam /your/path/to/hg38.fa your_output_prefix
Note there will be 2 files generated, one a bed file without all the read names and details, and the other the old tsv file. So for the third parameter in running Straglr you should specify the output prefix without the tsv extension.
Hi, thanks for your prompt reply! Can you tell me how to update Straglr to the latest version? The following command does not seem to work on my server
pip install git+https://github.com/bcgsc/straglr.git#egg=straglr
When I run this command, I get some error messages:
(base) [shali@vm-login01 biosoft]$ pip install git+https://github.com/bcgsc/straglr.git#egg=straglr
Collecting straglr
Cloning https://github.com/bcgsc/straglr.git to /tmp/pip-install-4ctnhlfw/straglr_9e151966f37b49ff99dd3e0885f5c427
Running command git clone -q https://github.com/bcgsc/straglr.git /tmp/pip-install-4ctnhlfw/straglr_9e151966f37b49ff99dd3e0885f5c427
fatal: unable to access 'https://github.com/bcgsc/straglr.git/': OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to github.com:443
WARNING: Discarding git+https://github.com/bcgsc/straglr.git#egg=straglr. Command errored out with exit status 128: git clone -q https://github.com/bcgsc/straglr.git /tmp/pip-install-4ctnhlfw/straglr_9e151966f37b49ff99dd3e0885f5c427 Check the logsfor full command output.
ERROR: Could not find a version that satisfies the requirement straglr (unavailable)
ERROR: No matching distribution found for straglr (unavailable)
So I tried downloading the ZIP and unzipping it. Then I ran the following command:
python setup.py build
python setup.py install
I think the installation seems to be successful. But according to what you said, this is not the latest version and I would like to know how to update Straglr to the latest version. Thanks again!
My version of Straglr is 1.2.0, which is also shown in the error message. The same error occurred when I ran the test data.
(base) [shali@vm-login02 test]$ straglr.py test.bam /public/home/fan_lab/shali/reference/hg38_22_XYM.fa ./bam/try
Traceback (most recent call last):
File "/public/home/fan_lab/shali/yes/bin/straglr.py", line 4, in <module>
__import__('pkg_resources').run_script('straglr==1.2.0', 'straglr.py')
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pkg_resources/__init__.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1455, in run_script
exec(script_code, namespace, namespace)
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/straglr-1.2.0-py3.7.egg/EGG-INFO/scripts/straglr.py", line 80, in <module>
File "/public/home/fan_lab/shali/yes/lib/python3.7/site-packages/straglr-1.2.0-py3.7.egg/EGG-INFO/scripts/straglr.py", line 50, in main
TypeError: __init__() got an unexpected keyword argument 'min_cluster_size'
I don't know if it was my installation method, python setup.py build
and python setup.py install
, that leads to this error.
After changing HTTPS to HTTP, I successfully installed Straglr. I tried to run the data in test
directory and got the same results as in genome_scan.tsv
.
Currently, I'm trying to run Straglr on my own data, using both --reads_fasta
and --tmpdir
parameters.
Thank again! It was very helpful to me. I'll contact you if I have any questions.
Glad to hear that it's working now, at least for the test data :)
Let me know if there is any issues. BTW, --min_ins 50
may be a bit too low, not sure if you are analyzing PacBio CCS/HiFi or Nanopore reads, but Nanopore reads are more noisy you may get many 50bp insertions that may not be real.
Yeah, but the results turned out to be a little different than I expected. With the --reads_fasta
and --tmpdir
option, Straglr gets less tandem repeats (I thought it would get more tandem repeats with fastq file). I got 7821 tandem repeats without fastq files, while only 7,316 results when providing fastq files. And the log file is empty.
I wonder if this is normal? Why did Straglr end up getting fewer tandem repeats with fastq file? Was it more rigorously validating the Tandem Repeats it found? Thanks!
There should be only 13 reads that carry an expansion at the ATXN10 locus in the test data, with running Straglr using the default parameters. Did you run with some different parameters?
The current version will not output any messages to stdout, it will only do so if you run with --debug
, which will not take care of getting rid of the temporary files if you --debug
is turned on (you will have to remove them manually).
Hi, the results I mentioned above are based on my own data, not the test data. I thought offering fastq would enable Straglr to find more tandem Repeats Loci. But the reality is that there are fewer, which really confused me.
By the way, given that the current version will not output any messages to stdout, I will not know if an error is reported unless adding the --debug
option?
Thanks!
Yes, have to turn on --debug
to see the warning messages. The numbers you reported are the numbers of loci, not the numbers of reads, right?
There is a potential glitch in using the read sequences I just thought of now. I'll need to check that. So I suggest taking the bam file results as they are for now.
Yes, the numbers I reported are the numbers of loci, not the numbers of reads. Unfortunately, with the debug option, the error message appears again.
trf input /public/home/fan_lab/shali/VNTR/Straglr/C1_ins20_fasta_debug/tmp/tmpdajybv79
problem getting seq1 m64030_210322_005835/42141537/ccs ['chr16', 34584353, 34584354, 'AATGGAATCATCATCGAATGGAATCG,ATCGAATGGACTCGAATGGAATCATCATCGAATGGAATCGAATGGAATC,CGAATGGAAACATCATCAATGGAAT,CGAATGGAATCATCATCGAATGGAAT,CGAATGGAATCGAATGGAATCACAT'] None None None
problem getting seq1 m64031_210323_071702/119540563/ccs ['chr16', 34584353, 34584354, 'AATGGAATCATCATCGAATGGAATCG,ATCGAATGGACTCGAATGGAATCATCATCGAATGGAATCGAATGGAATC,CGAATGGAAACATCATCAATGGAAT,CGAATGGAATCATCATCGAATGGAAT,CGAATGGAATCGAATGGAATCACAT'] None None None
Given that this seems to be a warning message rather than an error message, I wonder if it might adversely affect the results? Thanks!
Yes, these messages come up whenever the the read coordinates that the script comes up with do not lead to successful extraction of the subsequence, usually as a result of some possibly split alignments
Hi, I'm trying to use Straglr to call tandem repeat. After the program finished, I've got the ins_merged.bed and tsv files. However, when I checked the log file, I found some error messages:
In another unfinished program, the error message is as follows:
I wonder if this will affect the accuracy and credibility of the resulting files. And I want to know how to solve these error messages. Any suggestions? Thanks ever so much!