Closed DrB-S closed 11 months ago
Do you still get this error when you use your version of HeatCluster?
It looks like the error is due to this line:
within_cluster_snps = sorted_df.apply(lambda row: row[row < 500].sum(), axis=1)
# Add 'Within_Cluster_SNPs' column to the sorted DataFrame
sorted_df['Within_Cluster_SNPs'] = within_cluster_snps.values
# Calculate silhouette scores for different numbers of clusters
silhouette_scores = []
if numSamples < 11:
upper_range = numSamples
else:
upper_range = 11
for n_clusters in range(2, upper_range):
kmeans = KMeans(n_clusters=n_clusters, n_init=10)
cluster_labels = kmeans.fit_predict(sorted_df.values)
silhouette_scores.append(silhouette_score(sorted_df.values, cluster_labels))
Do you have any snp distances < 500
This should definitely be a snp matrix added to https://github.com/DrB-S/HeatCluster/tree/main/test
snp_matrix.txt has nothing but 0's as values.
snp-dists 0.8.2,2023TB-0068-3,2023TB-0069-3,2023TB-0070-3,2023TB-0071-3,2023TB-0072-3,2023TB-0073-3,2023TB-0074-3,2023TB-0075-3,2023TB-0076-3,2023TB-0077-3,2023TB-0079-3,2023TB-0080-3,2023TB-0081-3,2023TB-0082,2023TB-0083,2023TB-0084,2023TB-0085,2023TB-0086,2023TB-0087,2023TB-0088,2023TB-0089,2023TB-0090,2023TB-0091,2023TB-0092,2023TB-0094-3,2023TB-0095-3,2023TB-0096-3,2023TB-0097-3,2023TB-0098-3,2023TB-0100-3,2023TB-0113,2023TB-0114,2023TB-0115,2023TB-0116,2023TB-0117,2023TB-0122,2023TB-0123,2023TB-0124,IS7493_GCF_000221985.1.gz 2023TB-0068-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0069-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0070-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0071-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0072-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0073-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0074-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0075-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0076-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0077-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0079-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0080-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0081-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0082,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0083,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0084,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0085,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0086,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0087,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0088,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0089,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0090,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0091,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0092,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0094-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0095-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0096-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0097-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0098-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0100-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0113,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0114,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0115,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0117,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0124,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 IS7493_GCF_000221985.1.gz,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
I didn't point to any config file in my command-line: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 90 --maxcpus 120 --reads reads --outgroup NC_000962.3.fna
Oh snap!
What is the roary/summary_statistics.txt?
Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 115 Shell genes (15% <= strains < 95%) 4144 Cloud genes (0% <= strains < 15%) 26604 Total genes (0% <= strains <= 100%) 30863
Wait, your organisms share 0 genes? Are you sure they're all TB?
That was what was advertised.
I have a hunch that one of your samples is wrong.
I'm unsure if there are any mycobacteria in the mash reference, but maybe. What does the grandeur/mash/mash_summary* file look like? Are your files showing matches to the expected organisms?
Maybe there's something in mlst? I can't remember if there's an mlst scheme for TB.
Maybe there's something in Kraken2 results... Have you tried running the reads through Kraken2?
You could also download some reference genomes from NCBI and use them in fastani. I have some instructions on how to do this here https://github.com/UPHL-BioNGS/Grandeur/wiki/fastani.
Also, if these samples are in the SRA, there should be KRONA results. This is an example that I found while search SRA : https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR26159034&display=analysis
They are all highly similar to one or more M tuberculosis taxa in mash except for one sample (2023TB-0071-3), which only has low level similarity to Gordonia_sputi_NBRC_100414.
Should I just take out 2023TB-0071-3?
Yes. I don't think 2023TB-0071-3 is what was advertised.
What steps do I need to take, if possible, to remove 2023TB-0071-3 and resume cleanly?
For that, you'll want to move the offending sample's fastq files from 'reads' and then use -resume
. nextflow will then run with the files still in its cache and rerun all the processes not in its cache.
I added kraken, and resumed, but it is starting over....
Grandeur errored out again. I tried using the contigs files produced by spades, but I am still getting the following error: Oct-02 16:18:47.099 [Actor Thread 217] ERROR nextflow.extension.OperatorImpl - @unknown java.lang.NullPointerException: Cannot invoke method split() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:44) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at Script_2cc42974$_runScript_closure1$_closure2$_closure3.doCall(Script_2cc42974:19) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:137) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at jdk.internal.reflect.GeneratedMethodAccessor248.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
The size results indicate that there are three different taxa: M tuberculosis, M intracellulare, and M bovis: sample,genus,species,accession,size,datasets_size,expected_size,mash_size,quast_size 2023TB-0068-3,Mycobacterium,tuberculosis,refseq-NZ-1423567-PRJNA224116-SAMN02414983-GCF_000679675.1-.-Mycobacterium,4368301,,,7086710,4368301 2023TB-0069-3,Mycobacterium,tuberculosis,refseq-NC-395095-PRJNA224116-SAMN03081423-GCF_000153685.2-.-Mycobacterium,4370652,,,6601180,4370652 2023TB-0070-3,Mycobacterium,tuberculosis,refseq-NZ-1402592-PRJNA224116-SAMN02364085-GCF_000668035.1-.-Mycobacterium,4406453,,,5090990,4406453 2023TB-0072-3,Mycobacterium,tuberculosis,refseq-NZ-1423573-PRJNA224116-SAMN02414989-GCF_000679775.1-.-Mycobacterium,4365461,,,6699100,4365461 2023TB-0073-3,Mycobacterium,tuberculosis,refseq-NZ-1448837-PRJNA224116-SAMN02599183-GCF_000665805.1-.-Mycobacterium,4380167,,,6156410,4380167 2023TB-0074-3,Mycobacterium,tuberculosis,refseq-NZ-1408951-PRJNA224116-SAMN02381028-GCF_000668875.1-.-Mycobacterium,4399422,,,6601180,4399422 2023TB-0076-3,Mycobacterium,tuberculosis,refseq-NZ-1448455-PRJNA224116-SAMN02585950-GCF_000652275.1-.-Mycobacterium,4382335,,,6789620,4382335 2023TB-0077-3,Mycobacterium,tuberculosis,refseq-NZ-1448760-PRJNA224116-SAMN02599106-GCF_000664465.1-.-Mycobacterium,4395249,,,6947300,4395249 2023TB-0079-3,Mycobacterium,tuberculosis,refseq-NZ-1438827-PRJNA224116-SAMN02567766-GCF_000649715.1-.-Mycobacterium,4387629,,,6951250,4387629 2023TB-0080-3,Mycobacterium,tuberculosis,refseq-NZ-1324224-PRJNA224116-SAMN02053747-GCF_000660365.1-.-Mycobacterium,4363197,,,6074630,4363197 2023TB-0081-3,Mycobacterium,tuberculosis,refseq-NZ-1448723-PRJNA224116-SAMN02599069-GCF_000663785.1-.-Mycobacterium,4366878,,,6213830,4366878 2023TB-0082,Mycobacterium,tuberculosis,refseq-NZ-1427233-PRJNA224116-SAMN02419589-GCF_000680855.1-.-Mycobacterium,4364487,,,6570050,4364487 2023TB-0083,Mycobacterium,intracellulare,refseq-NZ-1335421-PRJNA224116-SAMN02641626-GCF_000524015.1-.-Mycobacterium,11036365,,,14271600,11036365 2023TB-0084,Mycobacterium,tuberculosis,refseq-NZ-1402592-PRJNA224116-SAMN02364085-GCF_000668035.1-.-Mycobacterium,6371223,,,13907100,6371223 2023TB-0085,Mycobacterium,tuberculosis,refseq-NZ-1427321-PRJNA224116-SAMN02419677-GCF_000656255.1-.-Mycobacterium,4676034,,,8920340,4676034 2023TB-0086,Mycobacterium,tuberculosis,refseq-NZ-1408939-PRJNA224116-SAMN02381016-GCF_000668655.1-.-Mycobacterium,4384713,,,5578010,4384713 2023TB-0087,Mycobacterium,tuberculosis,refseq-NZ-1402588-PRJNA224116-SAMN02364081-GCF_000667955.1-.-Mycobacterium,4365827,,,6639600,4365827 2023TB-0088,Mycobacterium,tuberculosis,refseq-NZ-1324233-PRJNA224116-SAMN02053756-GCF_000660545.1-.-Mycobacterium,4401301,,,5944600,4401301 2023TB-0089,Mycobacterium,tuberculosis,refseq-NZ-1417020-PRJNA224116-SAMN02398705-NZ_JLNH-.-Mycobacterium,4367024,,,6364970,4367024 2023TB-0090,Mycobacterium,tuberculosis,refseq-NC-395095-PRJNA224116-SAMN03081423-GCF_000153685.2-.-Mycobacterium,4369835,,,6061060,4369835 2023TB-0091,Mycobacterium,tuberculosis,refseq-NZ-1354127-PRJNA224116-SAMN02231157-GCF_000667125.1-.-Mycobacterium,4381615,,,6387860,4381615 2023TB-0092,Mycobacterium,tuberculosis,refseq-NZ-1267360-PRJNA224116-SAMN01828247-GCF_000659105.1-.-Mycobacterium,4603507,,,7539900,4603507 2023TB-0094-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4299915,,,6865320,4299915 2023TB-0095-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4301140,,,5935100,4301140 2023TB-0096-3,Mycobacterium,tuberculosis,refseq-NZ-1773-PRJNA224116-SAMN02673326-GCF_000666085.1-.-Mycobacterium,4371550,,,7073920,4371550 2023TB-0097-3,Mycobacterium,tuberculosis,refseq-NZ-1354151-PRJNA224116-SAMN02231121-GCF_000666525.1-.-Mycobacterium,4377350,,,7000040,4377350 2023TB-0098-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4301726,,,6738720,4301726 2023TB-0100-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4300012,,,6684950,4300012 2023TB-0113,Mycobacterium,bovis,refseq-NZ-1765-PRJNA224116-SAMN03288261-GCF_000934325.1-.-Mycobacterium,4321907,,,5277580,4321907 2023TB-0114,Mycobacterium,bovis,refseq-NZ-1765-PRJNA224116-SAMN03290670-GCF_000878485.1-.-Mycobacterium,4330876,,,6229390,4330876 2023TB-0115,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4318315,,,6608810,4318315 2023TB-0116,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4324909,,,6352530,4324909 2023TB-0117,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4318196,,,5947660,4318196 2023TB-0122,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4325864,,,6406550,4325864 2023TB-0123,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4322108,,,5698950,4322108 2023TB-0124,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4314259,,,6453350,4314259
Grandeur errored out again. I tried using the contigs files produced by spades, but I am still getting the following error: Oct-02 16:18:47.099 [Actor Thread 217] ERROR nextflow.extension.OperatorImpl - @unknown java.lang.NullPointerException: Cannot invoke method split() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:44) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at Script_2cc42974$_runScript_closure1$_closure2$_closure3.doCall(Script_2cc42974:19) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:137) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at jdk.internal.reflect.GeneratedMethodAccessor248.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
This is a nextflow error. What command did you use to put files through Grandeur?
For the fastqs: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --reads reads --outgroup NC_000962.3.fna For the contigs: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --fastas fastas --outgroup NC_000962.3.fna
Is
nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --fastas fastas --outgroup NC_000962.3.fna
being run in the same directory that has reads
in it?
If so, it's grabbing all the fastq files, too.
No. There are no reads in the reads dir when running the fastas. And vice-versa.
Could you share the full error message?
Also, if you think you have two different types of mycobacterium (i.e. Mycobacterium tuberculosis and Mycobacterium bovis), you're going to want to just focus on one.
Grandeur isn't really intended to be used for genomic comparisons, but rather suspected clonal expansions (aka outbreaks).
Still, it can be used for any rational core-genome comparison, and you can lower the number of genes required for iqtree2 by setting the roary_min_genes paramater to something lower (the default is params.roary_min_genes = 1500
).
Here are the 2 nextflow logs.
nextflow_fastqs.log.gz
nextflow_contigs.log.gz
Do you have any files generated in grandeur/fastani ? (or whatever you chose your outdir to be)
Yes, but fastani_summary.csv is blank except for the header.
I think that might have caused the error. I didn't test that. Just a second while I get together the commands for a workaround.
Alright, this is a bug that I need to fix in the future, but there is a current workaround by adding in a file/files from ncbi to use as a fastani reference.
Workaround 1 : set the current_datasets parameter to true, this will use ncbi datasets to download representative genomes of organisms identified. This sometimes has runtime issues in cloud environments, however.
nextflow run UPHL-BioNGS/Grandeur <what you normally put here> -resume --current_datasets true
Workaround 2 : add relevant genomes for fastani
This involves downloading a fasta file from ncbi and putting it where you can access it locally. Here's a webpage for Mycobacterium tuberculosis : https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=1773
The reference genome for this can be found using the following command:
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000195955.2/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT&filename=GCF_000195955.2.zip" -H "Accept: application/zip"
Then you will need to unzip the downloaded file
unzip GCF_000195955.2.zip
The fasta file will be located at ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna
You can then use this fasta file in fastani with the following command:
nextflow run UPHL-BioNGS/Grandeur <what you normally put here> -resume ----fastani_ref ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna
More information for how to add fastani references can be found here : https://github.com/UPHL-BioNGS/Grandeur/wiki/fastani
Let me know if that fixes your issues!!!
Workaround 1 worked. Thanks!
Grandeur finished with errors (see attached): 1.) ValueError - only 1 label for 39 samples; 2). snp matrices were all 0's.
End of Grandeur TB nextflow log.txt