Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

error running cite-seq-count #150

Open fouerghi opened 3 years ago

fouerghi commented 3 years ago

Hi,

I used to be able to use CITE-seq-Count 1.3 just fine, but I have tried to plug in results from a new experiment using python 3.7 and 3.8 and both times I got these errors: Exception in thread Thread-3: Traceback (most recent call last):

File "/Users/opt/miniconda3/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/Users/opt/miniconda3/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/Users/opt/miniconda3/lib/python3.8/site-packages/multiprocess/pool.py", line 592, in _handle_results cache[job]._set(i, obj) File "/Users/opt/miniconda3/lib/python3.8/site-packages/multiprocess/pool.py", line 778, in _set self._error_callback(self._value) TypeError: '_io.TextIOWrapper' object is not callable

fjrossello commented 3 years ago

Hi,

I get the same error using both v1.4.5 (Python 3.8) and v1.4.3 (Python 3.6.5). I am trying to process a TotalSeqC experiment using 4 tags with a total of ~106M reads. I tried what has been previously discussed in issue #112 and issue #37 without success. Any ideas what might be happening?

I run CITE-seq-Count with the following parameters:

CITE-seq-Count -R1 $SAMPLE_FASTQ_R1 -R2 $SAMPLE_FASTQ_R2 -t $SAMPLE_TAGS -cbf 1 -cbl 16 -umif 17 -umil 26 --bc_collapsing_dist 0 --umi_collapsing_dist 1 --max-errors 1 --start-trim 10 -cells 0 -T 32 -wl $CITE_SEQ_COUNTS_DIR/barcodes.tsv -o $CITE_SEQ_COUNTS_DIR/$SAMPLE_NAME

As mentioned I get the following error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/site-packages/multiprocess/pool.py", line 592, in _handle_results
    cache[job]._set(i, obj)
  File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/site-packages/multiprocess/pool.py", line 778, in _set
    self._error_callback(self._value)
TypeError: '_io.TextIOWrapper' object is not callable

stdout stalls at this stage:

Loading whitelist
Counting number of reads
Started mapping
Processing 106,892,906 reads
CITE-seq-Count is running with 32 cores.

Thanks in advance.

Hoohm commented 3 years ago

Hello,

what happens if you run fewer reads? Do you still get the same error?

On Wed, 27 Oct 2021 at 02:28, fjrossello @.***> wrote:

Hi,

I get the same error using both v1.4.5 (Python 3.8) and v1.4.3 (Python 3.6.5). I am trying to process a TotalSeqC experiment using 4 tags with a total of ~106M reads. I tried what has been previously discussed in issue

112 https://github.com/Hoohm/CITE-seq-Count/issues/112 and issue #37

https://github.com/Hoohm/CITE-seq-Count/issues/37 without success. Any ideas what might be happening?

I run CITE-seq-Count with the following parameters:

CITE-seq-Count -R1 $SAMPLE_FASTQ_R1 -R2 $SAMPLE_FASTQ_R2 -t $SAMPLE_TAGS -cbf 1 -cbl 16 -umif 17 -umil 26 --bc_collapsing_dist 0 --umi_collapsing_dist 1 --max-errors 1 --start-trim 10 -cells 0 -T 32 -wl $CITE_SEQ_COUNTS_DIR/barcodes.tsv -o $CITE_SEQ_COUNTS_DIR/$SAMPLE_NAME

As mentioned I get the following error:

Exception in thread Thread-3: Traceback (most recent call last): File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/site-packages/multiprocess/pool.py", line 592, in _handle_results cache[job]._set(i, obj) File "/homevol/frossello/anaconda3/envs/biotools/lib/python3.8/site-packages/multiprocess/pool.py", line 778, in _set self._error_callback(self._value) TypeError: '_io.TextIOWrapper' object is not callable

stdout stalls at this stage:

Loading whitelist Counting number of reads Started mapping Processing 106,892,906 reads CITE-seq-Count is running with 32 cores.

Thanks in advance.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/150#issuecomment-952433832, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2CQY3Y5TA4F2AYHBUTUI5BS3ANCNFSM4X3OX7XA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany

https://github.com/Hoohm https://github.com/Hoohm

fjrossello commented 3 years ago

Hi,

Thanks for your prompt reply.

I have just run it using -n 20000000 and -n 2000000 combined with -T 16 and T 32 and got the same output.

Please let me know if there's anything else you would like me to try.

Cheers

Hoohm commented 3 years ago

I'm assuming it might be related to the way your paths are called. Have you tried giving them directly as a harcoded path?

fjrossello commented 3 years ago

Hi,

Thanks for your prompt reply. As advised, I tried using hardcoded paths in the script without success. I have also tried by including all necessary files within the same folder with the same outcome an error.

Find the script with hardcoded paths:

CITE-seq-Count -R1 /homevol/frossello/projects/data/fastq/SC/HTO/clipped/S2_L001_R1_001.26bp_5prime.fq.gz \ -R2 /homevol/frossello/projects/data/fastq/SC/HTO/clipped/S2_L001_R2_001.25bp_5prime.fq.gz \ -t /homevol/frossello/projects/data/tables/tags.csv \ -cbf 1 -cbl 16 -umif 17 -umil 26 --bc_collapsing_dist 0 --umi_collapsing_dist 1 --max-errors 1 --start-trim 10 -cells 0 -T 16 \ -wl /homevol/frossello/projects/output/cite_seq_counts/barcodes.tsv \ -o /homevol/frossello/projects/output/cite_seq_counts/sample_1

Command when all files were included in the same folder: CITE-seq-Count -R1 S2_L001_R1_001.26bp_5prime.fq.gz -R2 S2_L001_R2_001.25bp_5prime.fq.gz -t tags.csv -cbf 1 -cbl 16 -umif 17 -umil 26 --bc_collapsing_dist 0 --umi_collapsing_dist 1 --max-errors 1 --start-trim 10 -cells 0 -T 32 -wl barcodes.tsv -o ./test/

Thanks again for your help.

Cheers,

Hoohm commented 3 years ago

Doest it work if you call from he command line directly?

fjrossello commented 3 years ago

Hi,

Sorry, I am not sure what you mean by from the command line directly.

Thanks for your help.

Cheers,

Hoohm commented 3 years ago

I'm assuming you are using a bash script to run CSC. Could you try running it on a local computer, directly from the cmdline? Only on a few reads, doesn't need to be all of them I guess.

fjrossello commented 3 years ago

Thanks for clarifying. I did run the last two examples re hardcoded paths from the command line. As I mentioned, still the same output. Thanks again for your help. Cheers,

fjrossello commented 2 years ago

An update. The origin of my issue was having a mixture of read lengths in R2 (from 8 to 25 bases, 0.01% of the total no. of reads) which caused the error mentioned above. Once all reads R2 < 25 were discarded, the issue was solved. All R1 reads were 26 bases long. Apologies for the hassle. Cheers,