USDA-VS / vSNP

vSNP -- validate SNPs
GNU General Public License v3.0
19 stars 12 forks source link

Test data output? #5

Closed Thomieh73 closed 5 years ago

Thomieh73 commented 5 years ago

Hi I have installed vSNP using your instructions.

Due to an error, I had to uninstall samtools using conda since it suggested the set-up was not correct. In the processes it removed a few extra dependecies including pysam. I then installed both pysam and samtools and then I tried vSNP.py out on the mycobacterium test dataset.

That ended with this output:

runtime: 0:17:34.115335:

average_coverage: 13.8
time_stamp: 2018-12-03_20-33-19
sample_name: 13-1941
species: af
reference_sequence_name: NC_002945.4
R1size: 31.0MB
R2size: 38.8MB
allbam_mapped_reads: 286,205
genome_coverage: 99.02%
ave_coverage: 13.8
ave_read_length: 227.2
unmapped_reads: 1871
unmapped_assembled_contigs: 745
good_snp_count: 675
mlst_type: N/A
octalcode: 640013777377600
sbcode: N/A
hexadecimal_code: 68-0-5F-7E-FF-60
binarycode: 1101000000000010111111111110111111111100000
Q_ave_R1: 34.7
Q30_R1: 89.9%
Q_ave_R2: 27.4
Q30_R2: 41.4%
Path to cumulative stat summary file not found

runtime: 0:19:52.687740:

See files, vSNP has finished alignments

What does that mean, when path to stat summary is not found. Is that bad?

Thomas

stuber commented 5 years ago

Hi Thomas,

The message of stat summary not found is an internal message. It is not bad. Looks like everything ran as it should. The SB code came back as N/A which I thought might be an error, but it wasn’t. SB codes are grabbed by cross referencing the mbovis.org database. It’s a new pattern.

Just curious because I’ve dealt with the same pysam/samtools issue. I’ve had different pysam/samtools versions error due to shared libraries not being found. I saw this when installing within a conda environment that had a lot of programs in it. It was my general run environment which had tmux installed. My RedHat version of tmux is old (1.8). I could quickly upgrade tmux via conda to 2.7, but this caused pysam/samtools to error because of shared libraries, so I’ve learned to live with the older tmux version. When first setting up your environment where pysam failed was it in a currently active environment, or did it error from a newly created environment with only vSNP environment.yml dependencies? I’ve not had issues if a new environment is created with only the dependencies in this file, but I’d like to know if you did see the error within a newly created environment.

Tod

-- Tod Stuber Computational Biologist USDA-NVSL-VS National Veterinary Services Laboratories Ames, Iowa 515-343-6935

From: "Thomas H.A. Haverkamp" notifications@github.com Reply-To: USDA-VS/vSNP reply@reply.github.com Date: Monday, December 3, 2018 at 1:40 PM To: USDA-VS/vSNP vSNP@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [USDA-VS/vSNP] Test data output? (#5)

Hi I have installed vSNP using your instructions.

Due to an error, I had to uninstall samtools using conda since it suggested the set-up was not correct. In the processes it removed a few extra dependecies including pysam. I then installed both pysam and samtools and then I tried vSNP.py out on the mycobacterium test dataset.

That ended with this output:

runtime: 0:17:34.115335:

average_coverage: 13.8

time_stamp: 2018-12-03_20-33-19

sample_name: 13-1941

species: af

reference_sequence_name: NC_002945.4

R1size: 31.0MB

R2size: 38.8MB

allbam_mapped_reads: 286,205

genome_coverage: 99.02%

ave_coverage: 13.8

ave_read_length: 227.2

unmapped_reads: 1871

unmapped_assembled_contigs: 745

good_snp_count: 675

mlst_type: N/A

octalcode: 640013777377600

sbcode: N/A

hexadecimal_code: 68-0-5F-7E-FF-60

binarycode: 1101000000000010111111111110111111111100000

Q_ave_R1: 34.7

Q30_R1: 89.9%

Q_ave_R2: 27.4

Q30_R2: 41.4%

Path to cumulative stat summary file not found

runtime: 0:19:52.687740:

See files, vSNP has finished alignments

What does that mean, when path to stat summary is not found. Is that bad?

Thomas

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/USDA-VS/vSNP/issues/5, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFjCsdqnN-OdQUSQs3NFtlMov1WHt5JPks5u1X4sgaJpZM4Y_Rlz.

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Thomieh73 commented 5 years ago

Hi Tod, It's good to read that the path not found message is nothing to worry about. :-) I will ignore that then.

To answer your question on Pysam/samtools issue, I repeated the installation of vSNP to get more details, for you. The whole procedure was done only in the vSNP environment, once I activated that.

At the start, I followed the installation instructions to create the vSNP conda environment as described on your website. The last command was:

sed -i 's/print line.strip()/print(line.strip())/' $(which vcffirstheader)

When I than run the command on the test data, I get the following error thrown.

samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

I then thought samtools was not properly installed so I removed it, which affects the environment quite a bit:

environment location: /work/projects/nn9305k/src/anaconda3/envs/vsnp

  removed specs: 
    - samtools

The following NEW packages will be INSTALLED:

    krb5:            1.16.2-hbb41f41_0     conda-forge

The following packages will be REMOVED:

    pysam:           0.15.1-py36h0380709_0 bioconda   
    samtools:        1.9-h8ee4bcc_1        bioconda   

The following packages will be UPDATED:

    ca-certificates: 2018.03.07-0                      --> 2018.11.29-ha4d7672_0 conda-forge
    certifi:         2018.10.15-py36_0                 --> 2018.11.29-py36_1000  conda-forge
    curl:            7.62.0-hbc83047_0                 --> 7.62.0-h74213dd_0     conda-forge
    libcurl:         7.62.0-h20c2e04_0                 --> 7.62.0-hbdb9355_0     conda-forge

The following packages will be DOWNGRADED:

    libssh2:         1.8.0-h1ba5d50_4                  --> 1.8.0-h5b517e9_3      conda-forge
    openssl:         1.1.1a-h7b6447c_0                 --> 1.0.2p-h470a237_1     conda-forge
    python:          3.6.7-h0371630_0                  --> 3.6.6-h5001a0f_3      conda-forge

And then I installed samtools and pysam again in one go:

conda install -c bioconda pysam samtools

Which does a few things:

## Package Plan ##

  environment location: /work/projects/nn9305k/src/anaconda3/envs/vsnp

  added / updated specs: 
    - pysam
    - samtools

The following NEW packages will be INSTALLED:

    pysam:    0.15.1-py36h0380709_0 bioconda
    samtools: 1.9-h8ee4bcc_1        bioconda

After that I ran the pipeline and I got the results I presented in the message above.

The better method for installation Since the uninstalling and installing of samtools and pysam is not really necessary (same versions are used), I set up the installation as you described to Girum in an email to him.

Below are the commands I used:

Updating vcflib

Testing the installation! I created an interactive slurm job to run the analysis on our cluster Downloaded the mycobacterium test set:

This had no error messages and produced the same results as above. I think the only thing that is needed is to update the vcflib to make it work. That should be added to your instructions page.

I will ask Girum to test his data on the new installation.

Thomieh73 commented 5 years ago

This issue can be closed