GenomiqueENS / toulligQC

A post sequencing QC tool for Oxford Nanopore sequencers
Other
87 stars 8 forks source link

Can only merge Series or DataFrame objects, a <class 'NoneType'> was passed #17

Closed Radek91 closed 1 year ago

Radek91 commented 1 year ago

Hi!

I managed to run the latest version of toulligQC (2.3) with default guppy basecalling:

toulligqc \
--report-name "$run_id" \
--telemetry-source ./fastq_hac_400/sequencing_telemetry.js \
--sequencing-summary-source ./fastq_hac_400/sequencing_summary.txt \
--html-report-path ./toolligqc/"$run_id"_tooligqc_"$run_id".html \
--data-report-path ./toolligqc/"$run_id"_tooligqc_"$run_id".data 

However when I ran it with the demultiplexing files I got the following error:

toulligqc \
--force \
--report-name "$run_id" \
--barcoding \
--telemetry-source "$run_path"/fastq_hac_400/sequencing_telemetry.js \
--sequencing-summary-source "$run_path"/fastq_hac_400/sequencing_summary.txt \
--sequencing-summary-source "$run_path"/guppy_demultiplexed_pass/barcoding_summary_pass.txt \
--sequencing-summary-source "$run_path"/guppy_demultiplexed_fail/barcoding_summary_fail.txt \
--html-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.html \
--data-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.data \
--barcodes BC01,BC02,BC03,BC04,BC05,BC06,BC07,BC08,BC09,BC10,BC11,BC12,BC13,BC14,BC15,BC16,BC17,BC18,BC19,BC20,BC21,BC22,BC23,BC24
* Start Basecaller sequencing summary extractor
Traceback (most recent call last):
  File "/home/vincent.hahaut/anaconda3/bin/toulligqc", line 33, in <module>
    sys.exit(load_entry_point('toulligqc==2.3', 'console_scripts', 'toulligqc')())
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/toulligqc.py", line 347, in main
    extractor.init()
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/sequencing_summary_extractor.py", line 106, in init
    self.dataframe_1d = self._load_sequencing_summary_data()
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/sequencing_summary_extractor.py", line 408, in _load_sequencing_summary_data
    dataframes_merged = pd.merge(
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 74, in merge
    op = _MergeOperation(
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 598, in __init__
    _left = _validate_operand(left)
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 2148, in _validate_operand
    raise TypeError(
TypeError: Can only merge Series or DataFrame objects, a <class 'NoneType'> was passed

I did not manage to debug it myself. The links to the barcoding summary files are correct:

head -n 2 "$run_path"/guppy_demultiplexed_pass/barcoding_summary_pass.txt

read_id barcode_arrangement barcode_full_arrangement    barcode_kit barcode_variant barcode_score   barcode_front_id    barcode_front_score barcode_front_refseq    barcode_front_foundseq  barcode_front_foundseq_length   barcode_front_begin_index   barcode_rear_id barcode_rear_score  barcode_rear_refseq barcode_rear_foundseq   barcode_rear_foundseq_length    barcode_rear_end_index  barcode_front_total_trimmed barcode_rear_total_trimmed  barcode_mid_front_id    barcode_mid_front_score barcode_mid_front_end_index barcode_mid_rear_id barcode_mid_rear_score  barcode_mid_rear_end_index  adapter_front_id    adapter_front_score adapter_front_foundseq_len  adapter_front_begin_index   adapter_rear_id adapter_rear_score  adapter_rear_foundseq_len   adapter_rear_end_index  adapter_mid_id  adapter_mid_score   adapter_mid_end_index
fa6dee83-eb61-5169-bddf-8f6c19de6b53    barcode19   NB19_var1   NB  var1    46.3333 NB19_FWD    46.3333 AGGTTAAGTTCCTCGTGCAGTGTCAAGAGATCAGCACCT AGGTAGTCTGTACATAATTCAGAGAGAGGACAT   33  37  NB19_REV    21.9167 GGTGCTGATCTCTTGACACTGCACGAGGAACTTAACCTTAGCAAT   TGCTGCCATTCGGCCAGTGAGTCTTCTCCCAAT   33  84  70  117 unclassified    0   -1  unclassified    0   -1  ADAPTER_LSK109_FWD  37.9286 20  73  ADAPTER_LSK110_REV  43.5556 13  114 unclassified    0   -1
head -n 2 "$run_path"/guppy_demultiplexed_fail/barcoding_summary_fail.txt

read_id barcode_arrangement barcode_full_arrangement    barcode_kit barcode_variant barcode_score   barcode_front_id    barcode_front_score barcode_front_refseq    barcode_front_foundseq  barcode_front_foundseq_length   barcode_front_begin_index   barcode_rear_id barcode_rear_score  barcode_rear_refseq barcode_rear_foundseq   barcode_rear_foundseq_length    barcode_rear_end_index  barcode_front_total_trimmed barcode_rear_total_trimmed  barcode_mid_front_id    barcode_mid_front_score barcode_mid_front_end_index barcode_mid_rear_id barcode_mid_rear_score  barcode_mid_rear_end_index  adapter_front_id    adapter_front_score adapter_front_foundseq_len  adapter_front_begin_index   adapter_rear_id adapter_rear_score  adapter_rear_foundseq_len   adapter_rear_end_index  adapter_mid_id  adapter_mid_score   adapter_mid_end_index
82bb5bfb-5e34-565e-a6b4-6ad6947004b0    unclassified    NB12_var2   NB  var2    39.75   NB12_FWD    39.75   ATTGCTAAGGTTAATCCGATTCTGCTTCTTTCTACCTGCAGCACC   TTGCTACATAGACGGGTGTGCTCTTTTCACTGTTCAG   37  41  NB12_REV    16.25   AGGTGCTGCAGGTAGAAAGAAGCAGAATCGGATTAACCT GGGCTAGGTTTAGCCCCATACTATGTTAGTTGATACC   37  39  0   0   unclassified    0   -1  unclassified    0   -1  ADAPTER_LSK109_FWD  37  22  126 ADAPTER_LSK109_REV  29.8261 12  5   unclassified    0   -1

Any idea why this is happening ?

Thank you in advance

alihamraoui commented 1 year ago

Dear @Radek91 ,

Thanks for reporting this issue.

Could you please send me your env configuration (python & pandas version)?

It would be nice if I could have the first 10 lines of each of your summary files, so I can reproduce the bug.

You can also try toulligQC with our Docker image (genomicpariscentre/toulligqc:2.3):

$ docker pull genomicpariscentre/toulligqc:2.3
$ docker run -ti \
             -u $(id -u):$(id -g) \
             --rm \
             -v /path/to/basecaller/sequencing/summary/file:/path/to/basecaller/sequencing/summary/file \
             -v /path/to/basecaller/sequencing/telemetry/file:/path/to/basecaller/telemetry/summary/file \
             -v /path/to/result/directory:/path/to/result/directory \
             toulligqc:2.3

Best, Ali

Radek91 commented 1 year ago

Dear Ali,

Thank you for your fast answer.

sequencing_summary.txt barcoding_summary_pass.txt barcoding_summary_fail.txt

python --version
Python 3.8.8
pip show pandas
Name: pandas
Version: 1.2.4

I am not familiar with docker and unfortunately the document is a bit limited. How can I translate my second command to the docker image ? The paths repetitions are especially confusing.

alihamraoui commented 1 year ago

Dear @Radek91,

As I can see, your sequencing_summary.txt is a sequencing summary file with the barcode information (it contains the barcode_arrangement column). So, you don't need to provide a Barcoding summary file.

My suggestion is to try the following command:

toulligqc \
--force \
--report-name "$run_id" \
--barcoding \
--telemetry-source "$run_path"/fastq_hac_400/sequencing_telemetry.js \
--sequencing-summary-source "$run_path"/fastq_hac_400/sequencing_summary.txt \
--html-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.html \
--data-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.data \
--barcodes BC01,BC02,BC03,BC04,BC05,BC06,BC07,BC08,BC09,BC10,BC11,BC12,BC13,BC14,BC15,BC16,BC17,BC18,BC19,BC20,BC21,BC22,BC23,BC24

I hope this will help. Ali

Radek91 commented 1 year ago

Dear Ali,

Thank you very much for your help. This worked well :-)

I believe this may be because I used one single command to perform the demultiplexing & basecalling. Then seeing that it was not producing the barcoding outputs mentioned in your github I did it separately again, creating redundant information.

Have a nice day!