AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
53 stars 28 forks source link

Version mismatch between Git Clone and Docker. #41

Closed iamyingzhou closed 1 year ago

iamyingzhou commented 1 year ago

When using AA, I have consistently encountered an error message stating "mosek license not found". After carefully reviewing the code in run_paa_docker.py, I observed a discrepancy between the mounted volume path used for Docker ("/home/programs/mosek/8/licenses") and the value of MOSEKLM_LICENSE_FILE in the run_paa_script.sh script inside the Docker container ("/home/mosek/"). This mismatch between paths appears to be the cause of the error.

jluebeck commented 1 year ago

Hi,

Thanks for checking in about this. Do you have a Mosek license set on your host system? We are not allowed to package the license with the docker image so the runscript will attempt to mount it into the image. If you have a Mosek license on the host system, where is it located?

If that is not the source of the issue, can you confirm which version of the Docker image you have? This may be a mismatch between the version of the runscript and the version of the docker image. If you have not already tried updating both your version of the runscript and also pulling the latest docker image that may help.

I am happy to review any logs you are able to provide about the run to help debug.

Thanks, Jens

iamyingzhou commented 1 year ago

Hello! Thank you for your response. I placed my mosek.lic file inside the $HOME/mosek/ directory.

I was able to resolve the error by directly downloading run_paa_docker.py from GitHub. However, when I try to obtain run_paa_docker.py using the command "git clone https://github.com/jluebeck/AmpliconSuite-pipeline.git," I still encounter an error. The run_paa_docker.py obtained through these two methods is different. I guess the version obtained through git clone is older than the one on GitHub?

jluebeck commented 1 year ago

Thanks - I think the issue was that there was still an old fork that was being pulled from after moving AmpliconSuite-pipeline to the AmpliconSuite GitHub organization which I have now deleted. If you pull again the old URL should redirect to the new version of AmpliconSuite. If not please change the URL accordingly: https://github.com/AmpliconSuite/AmpliconSuite-pipeline.

iamyingzhou commented 1 year ago

Thanks!

iamyingzhou commented 1 year ago

I'm sorry, I still have a few more questions. 1.is AA able to accept CRAM format? If not, is it necessary to convert it using samtools? 2.After converting the CRAM file to BAM, do I need to do duplicate removal and sort the BAM file and create an index? Does AA have these operations built-in? 3.I am running AA on DNAnexus. Is there a recommended installation method? Currently, I am using Docker, but I noticed in your recent documentation that Docker is listed under option D. Is it not user-friendly enough?

jluebeck commented 1 year ago

Hi,

  1. AmpliconSuite-pipeline will also function on coordinate-sorted CRAM files, provided that the CRAM reference is in place.
  2. No need to convert CRAM to BAM. However, regardless of the input being formatted as CRAM or BAM, it must be coordinate sorted. If the user starts the process with a .bam or .cram, we leave it to the user to coordinate-sort their inputs before running. It is very rare that people are starting from name-sorted or unsorted inputs anyways.
  3. I am not very familiar with using DNAnexus, sorry. If you are able to deploy a docker or singularity container on the platform that may be the easiest solution. The container options are listed last since we expect that only users with special constraints on tool installation will need them. The container option should be user-friendly, and we welcome feedback on ways that experience can be improved.

Thanks, Jens

iamyingzhou commented 1 year ago

Thank you so much! Should I directly place the CRAM file under the "--bam" parameter? Another reason for converting CRAM to BAM is that I intend to utilize GATK for removing duplicates from the sequencing file, as shown in the following command: "gatk MarkDuplicates --INPUT sample.bam --OUTPUT sample.rmdup.bam --METRICS_FILE sample.metrics --VALIDATION_STRINGENCY SILENT --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --ASSUME_SORT_ORDER "queryname" --CREATE_MD5_FILE true." Would it be necessary for me to perform this step? if this step is already included as part of AA?

jluebeck commented 1 year ago

Yes, the CRAM file can be given for the argument of --bam. The pipeline will work without marking duplicates. Duplicates are only removed if the user provides fastq files to the pipeline.

iamyingzhou commented 1 year ago

Thank you!