dsarov / ARDaP

Comprehensive resistance detection from WGS data
17 stars 6 forks source link

ReferenceAlignment error #32

Closed Rutaiwan closed 7 months ago

Rutaiwan commented 9 months ago

Hi,

Could you please help with this error? I always get this ReferenceAlignment errors while running Ardap with B. pseudomallei.

Thank you, Rutaiwan

1e/43724e] process > Trimmomatic (SRR2975590_1)        [100%] 1 of 1 ▒
[9c/5c9064] process > Downsample (SRR2975590_1)         [100%] 1 of 1 ▒
[7b/2d0e91] process > ReferenceAlignment (SRR2975590_1) [100%] 5 of 5, failed: 5, retries: 4 ▒
[-        ] process > Deduplicate                       -
[-        ] process > VariantCalling                    -
[-        ] process > SqlSnpsIndels                     -
[-        ] process > R_report                          -
[86/227091] NOTE: Process `ReferenceAlignment (SRR2975590_1)` terminated with an error exit status (1) -- Execution is retried (4)
Error executing process > 'ReferenceAlignment (SRR2975590_1)'

Caused by:
  Process `ReferenceAlignment (SRR2975590_1)` terminated with an error exit status (1)
dsarov commented 9 months ago

Hi Rutaiwan,

Could you possibly send me the nextflow.log file from this run? Have you had this issue with other genomes? Has the software ever fully completed a run before?

Thanks,

Derek

dsarov commented 9 months ago

SRR2975590 completed on my system. I think there might be something wrong with the way the reads are paired? Usually the sample name would be SRR2975590 not SRR2975590_1.

You could try redownloading from the SRA database and making sure that the files are correctly paired and in the correct format.

These are the commands I used to download the data:

curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR297/000/SRR2975590/SRR2975590_1.fastq.gz -o SRR2975590_Comparative_genomics_of_Burkholderia_spp._Sample_INT2-BP100_1.fastq.gz curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR297/000/SRR2975590/SRR2975590_2.fastq.gz -o SRR2975590_Comparative_genomics_of_Burkholderia_spp._Sample_INT2-BP100_2.fastq.gz

Rutaiwan commented 9 months ago

Hi Derek,

Thank you for your reply. I assume something went wrong during my installation or input the path directory of my reads incorrectly? Could you please let me know your command to run the process?

I did provide the path to paired reads in the same folder ; nextflow run main.rf --fastq /path_to_reads/*

Cheers,

dsarov commented 9 months ago

Oh, I think I understand what is going wrong now. The --fastq flag is just there if you need to change your fastq file formatting (e.g. you have reads named Bp_strain_1_sequence.fastq.gz and Bp_strain_2_sequence.fastq.gz instead of Bp_strain_1.fastq.gz and Bp_strain_1.fastq.gz)

Try just running the tool from the directory containing your reads but don't include the --fastq flag.

Rutaiwan commented 9 months ago

Thanks again, Derek. I try to solve the issue following your suggestion. The software never fully completed a run. Please kindly find the attachment for nextflow.log. Oh I edited the version of gatK in env.yaml ( gatk4=4.1.8 --> gatK=4.50) during installation process because the software require old version of openjdk. Not sure if this would cause any issue related to ReferenceAlignment process. .nextflow.log

dsarov commented 9 months ago

It looks like it's an issue with the resfinder installation. The specific error is near the end of the log file and is due to resfinder not being able to find the python module "tabulate".

When you installed ARDaP, did you also install the resfinder dependencies?

The default install command is "pip3 install tabulate biopython cgecore gitpython python-dateutil". Just make sure you have your ardap environment loaded with conda to make sure everything is available for the whole pipeline.

Cheers,

Derek

dsarov commented 9 months ago

I think gatk 4.5 should be ok. It hasn't gotten to the variant calling part of the pipeline so it isn't a problem yet.

Rutaiwan commented 8 months ago

Hi Derek,

I couldn't install Resfinder dependencies successfully with this command pip3 install tabulate biopython cgecore gitpython python-dateutil . The error are below.. maybe this is why ReferenceAlignment would never complete successfully..


Collecting biopython
  Using cached biopython-1.83.tar.gz (19.4 MB)
  Preparing metadata (setup.py) ... done
Collecting cgecore
  Using cached cgecore-1.5.6-py3-none-any.whl.metadata (977 bytes)
Collecting gitpython
  Using cached GitPython-3.1.42-py3-none-any.whl.metadata (12 kB)
Collecting python-dateutil
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting numpy (from biopython)
  Using cached numpy-1.26.4.tar.gz (15.8 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install backend dependencies did not run successfully.
  │ exit code: 1
  ╰─> [287 lines of output]
      Picked up _JAVA_OPTIONS: -Duser.home=/home/rdusadeepong -Duser.name=rdusadeepong -Duser.timezone=Australia/Melbourne
      Collecting patchelf>=0.11.0
        Using cached patchelf-0.17.2.1.tar.gz (167 kB)
        Installing build dependencies: started
        Installing build dependencies: finished with status 'done'
        Getting requirements to build wheel: started
        Getting requirements to build wheel: finished with status 'done'
        Preparing metadata (pyproject.toml): started
        Preparing metadata (pyproject.toml): finished with status 'done'
      Building wheels for collected packages: patchelf
        Building wheel for patchelf (pyproject.toml): started
        Building wheel for patchelf (pyproject.toml): finished with status 'error'
        error: subprocess-exited-with-error
dsarov commented 8 months ago

Hi Rutaiwan,

Yes, that's definitely the issue. That part of the pipeline is trying to run resfinder but it isn't finishing successfully so the entire pipeline is failing. You could try having a look at the resfinder github to see if their instructions help you install the dependencies https://github.com/cadms/resfinder

Perhaps just try to install one by one to see which one/s don't work?

dsarov commented 8 months ago

Actually the instructions here are probably better --> https://pypi.org/project/resfinder/

You could try just pip install resfinder to see if that works