Isabella136 / AmrPlusPlus_SNP

GNU General Public License v3.0
1 stars 1 forks source link

Column blank in SNPConfirmed matrix from AMR++ run #22

Closed LLansing closed 1 year ago

LLansing commented 1 year ago

In my AMR++ run a single column corresponding to 1 of my input samples is blank in the SNPConfirmed result matrix. The sample name is empty in the header row and the AMR counts are missing for this column in all the subsequent rows, identified by two tab characters with nothing in between.

In the nextflow output the missing sample is show to have performed the runsnp step and there is no indication of error.

Isabella136 commented 1 year ago

Hi @LLansing, I have a couple of questions:

  1. What are the names of the original input file and the SAM file for this sample?
  2. Is the column located in the middle of the matrix or at the end?
  3. Is the column also missing from the previous version of the matrix? If not, can you tell me the name of the column?
  4. In the ResistomeResults output folder, do you see a folder called {sample_name}_SNPs? Does it contain any files?
LLansing commented 1 year ago
  1. As described in this issue, there were 18 samples that were missing from the SNP matrix in addition to the 1 sample which presumably was meant to fill the empty column (19 missing samples total). The input files for the empty column sample were SRR13495847_{1,2}.fastq.gz (paired input). All of the samples, both successful ones and missing ones take the naming pattern SRR13495###_{1,2}.fastq.gz and I see no pattern to the missing samples (e.g. they're not consecutive). As for the SAM file, I'm not sure what those file names are as AMR++ created them as the pipeline ran, but I had to delete various intermediate files and results for sake of storage space. As noted in the other issue, a previous run with the same samples had a different set of 25 samples missing, and 2 empty columns, with only 3 samples common across both runs.

  2. The column is somewhere in the middle. Also, I'm not sure whether this is related, but there is a peculiar row at the bottom of the matrix with gene_accession = 0, and all other columns except 2 empty (distinguished by successive tab characters), and the 2 exception columns also having 0 as their value.

  3. The column name of the empty column is SRR13495847 (same as the sample name). It is not missing from the regular, non-SNP confirmed AMR table (which is complete, by the way). It is also not one of the missing columns/samples from the previous run.

  4. Unfortunately, I've already removed the ResistomeResults folder for sake of storage. I'm not sure whether the {sample_name}_SNPs folder was in there or if it contained any files.

Isabella136 commented 1 year ago

This leads me to believe that this is more likely an error with how the AMR++ pipeline and AMRPlusPlus_SNP script are communicating. I'll go ahead and make a note of that in the AMR++ repositiory.