pipeline_readqc error in pandas concat

nickilott commented 5 years ago

Hi,

In the summarizeFastQC taks in pipeline_readqc I am coming across the following error:

Traceback (most recent call last):
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
    register_cleanup, touch_files_only)
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/ruffus/task.py", line 544, in job_wrapper_io_files
    ret_val = user_defined_work_func(*params)
  File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/cgatpipelines/tools/pipeline_readqc.py", line 377, in summarizeFastQC
    all_files)
  File "/gfs/devel/nilott/cgat-developers-v0/cgat-flow/cgatpipelines/tasks/readqc.py", line 274, in read_fastqc
    df = pd.concat(dd, keys=tracks, names=["track"])
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 212, in concat
    copy=copy)
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 363, in __init__
    self.new_axes = self._get_new_axes()
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 443, in _get_new_axes
    new_axes[self.axis] = self._get_concat_axis()
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 498, in _get_concat_axis
    self.levels, self.names)
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 596, in _make_concat_multiindex
    mapped = level.get_indexer(hlevel)
  File "/gfs/devel/nilott/cgat-developers-v0/conda-install/envs/cgat-f/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2687, in get_indexer
    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Does anyone know the cause of this?

Thanks

Nick

Acribbs commented 5 years ago

Hmm, I don't think I have seen this error before, whats your head output of your fasta_data.txt?

Which fastqc and pandas are you using?

nickilott commented 5 years ago

Fastqc v0.11.7 pandas 0.22.0

nickilott commented 5 years ago

$ head fastqc_data.txt

FastQC 0.11.7

Basic Statistics pass

Measure Value

Filename Biopsy-Caecum-HEALTHY-R0.fastq.1.gz File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 29656 Sequences flagged as poor quality 0 Sequence length 300 %GC 51

Acribbs commented 5 years ago

I see your Basic Statistics doesn't read >> Basic Statistics pass

Do you see >> END MODULE after the %GC 51 line?

Acribbs commented 5 years ago

I will test on my end as I now have the same versions of pandas and fastqc as you

nickilott commented 5 years ago

yes >> END module is there

nickilott commented 5 years ago

Thanks for the help

Acribbs commented 5 years ago

Im all out, I have just tested it on my end with the same pandas and fastqc versions and it generates the output of the summarizeFastQC correctly.

I have looked at the code that summarizeFastQC calls and I can't see anything obvious that could be the error in the pd.concat function. Your files don't start with a number do they?

nickilott commented 5 years ago

Thanks for your help - no files don't start with a number...it's not an obvious problem and its strange because the fastqc_data.txt output is not of the same format as previous runs with a different data set but the same version of fastqc...I will try re-running and see what happens.

nickilott commented 5 years ago

Hi Adam,

It does look like it's something to do with the filenames...not quite sure what the issue is but not to do with the code.

Acribbs commented 5 years ago

ok great, if you manage to work it id love to know what it was. Good luck

cgat-developers / cgat-flow

pipeline_readqc error in pandas concat #87

FastQC 0.11.7

Measure Value