Closed LLansing closed 1 year ago
Hi @LLansing, I have a couple of questions:
ResistomeResults
output folder, do you see a folder called {sample_name}_SNPs
? Does it contain any files?As described in this issue, there were 18 samples that were missing from the SNP matrix in addition to the 1 sample which presumably was meant to fill the empty column (19 missing samples total). The input files for the empty column sample were SRR13495847_{1,2}.fastq.gz
(paired input). All of the samples, both successful ones and missing ones take the naming pattern SRR13495###_{1,2}.fastq.gz
and I see no pattern to the missing samples (e.g. they're not consecutive).
As for the SAM file, I'm not sure what those file names are as AMR++ created them as the pipeline ran, but I had to delete various intermediate files and results for sake of storage space.
As noted in the other issue, a previous run with the same samples had a different set of 25 samples missing, and 2 empty columns, with only 3 samples common across both runs.
The column is somewhere in the middle. Also, I'm not sure whether this is related, but there is a peculiar row at the bottom of the matrix with gene_accession = 0, and all other columns except 2 empty (distinguished by successive tab characters), and the 2 exception columns also having 0 as their value.
The column name of the empty column is SRR13495847
(same as the sample name). It is not missing from the regular, non-SNP confirmed AMR table (which is complete, by the way). It is also not one of the missing columns/samples from the previous run.
Unfortunately, I've already removed the ResistomeResults
folder for sake of storage. I'm not sure whether the {sample_name}_SNPs
folder was in there or if it contained any files.
This leads me to believe that this is more likely an error with how the AMR++ pipeline and AMRPlusPlus_SNP script are communicating. I'll go ahead and make a note of that in the AMR++ repositiory.
In my AMR++ run a single column corresponding to 1 of my input samples is blank in the SNPConfirmed result matrix. The sample name is empty in the header row and the AMR counts are missing for this column in all the subsequent rows, identified by two tab characters with nothing in between.
In the nextflow output the missing sample is show to have performed the
runsnp
step and there is no indication of error.