UPHL-BioNGS / Grandeur

UPHL's Reference Free Pipeline
GNU General Public License v3.0
23 stars 7 forks source link

Value labels and snp-dists failure #129

Closed DrB-S closed 11 months ago

DrB-S commented 11 months ago

Grandeur finished with errors (see attached): 1.) ValueError - only 1 label for 39 samples; 2). snp matrices were all 0's.
End of Grandeur TB nextflow log.txt

erinyoung commented 11 months ago

Do you still get this error when you use your version of HeatCluster?

It looks like the error is due to this line:

within_cluster_snps = sorted_df.apply(lambda row: row[row < 500].sum(), axis=1)

# Add 'Within_Cluster_SNPs' column to the sorted DataFrame
sorted_df['Within_Cluster_SNPs'] = within_cluster_snps.values

# Calculate silhouette scores for different numbers of clusters
silhouette_scores = []

if numSamples < 11:
    upper_range = numSamples
else:
    upper_range = 11

for n_clusters in range(2, upper_range):
    kmeans = KMeans(n_clusters=n_clusters, n_init=10)
    cluster_labels = kmeans.fit_predict(sorted_df.values)
    silhouette_scores.append(silhouette_score(sorted_df.values, cluster_labels))

Do you have any snp distances < 500

erinyoung commented 11 months ago

This should definitely be a snp matrix added to https://github.com/DrB-S/HeatCluster/tree/main/test

DrB-S commented 11 months ago

snp_matrix.txt has nothing but 0's as values.

DrB-S commented 11 months ago

snp-dists 0.8.2,2023TB-0068-3,2023TB-0069-3,2023TB-0070-3,2023TB-0071-3,2023TB-0072-3,2023TB-0073-3,2023TB-0074-3,2023TB-0075-3,2023TB-0076-3,2023TB-0077-3,2023TB-0079-3,2023TB-0080-3,2023TB-0081-3,2023TB-0082,2023TB-0083,2023TB-0084,2023TB-0085,2023TB-0086,2023TB-0087,2023TB-0088,2023TB-0089,2023TB-0090,2023TB-0091,2023TB-0092,2023TB-0094-3,2023TB-0095-3,2023TB-0096-3,2023TB-0097-3,2023TB-0098-3,2023TB-0100-3,2023TB-0113,2023TB-0114,2023TB-0115,2023TB-0116,2023TB-0117,2023TB-0122,2023TB-0123,2023TB-0124,IS7493_GCF_000221985.1.gz 2023TB-0068-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0069-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0070-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0071-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0072-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0073-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0074-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0075-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0076-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0077-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0079-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0080-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0081-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0082,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0083,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0084,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0085,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0086,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0087,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0088,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0089,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0090,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0091,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0092,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0094-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0095-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0096-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0097-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0098-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0100-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0113,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0114,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0115,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0117,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2023TB-0124,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 IS7493_GCF_000221985.1.gz,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

DrB-S commented 11 months ago

I didn't point to any config file in my command-line: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 90 --maxcpus 120 --reads reads --outgroup NC_000962.3.fna

erinyoung commented 11 months ago

Oh snap!

What is the roary/summary_statistics.txt?

DrB-S commented 11 months ago

Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 115 Shell genes (15% <= strains < 95%) 4144 Cloud genes (0% <= strains < 15%) 26604 Total genes (0% <= strains <= 100%) 30863

erinyoung commented 11 months ago

Wait, your organisms share 0 genes? Are you sure they're all TB?

DrB-S commented 11 months ago

That was what was advertised.

erinyoung commented 11 months ago

I have a hunch that one of your samples is wrong.

I'm unsure if there are any mycobacteria in the mash reference, but maybe. What does the grandeur/mash/mash_summary* file look like? Are your files showing matches to the expected organisms?

Maybe there's something in mlst? I can't remember if there's an mlst scheme for TB.

Maybe there's something in Kraken2 results... Have you tried running the reads through Kraken2?

You could also download some reference genomes from NCBI and use them in fastani. I have some instructions on how to do this here https://github.com/UPHL-BioNGS/Grandeur/wiki/fastani.

Also, if these samples are in the SRA, there should be KRONA results. This is an example that I found while search SRA : https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR26159034&display=analysis

DrB-S commented 11 months ago

They are all highly similar to one or more M tuberculosis taxa in mash except for one sample (2023TB-0071-3), which only has low level similarity to Gordonia_sputi_NBRC_100414.

DrB-S commented 11 months ago

Should I just take out 2023TB-0071-3?

erinyoung commented 11 months ago

Yes. I don't think 2023TB-0071-3 is what was advertised.

DrB-S commented 11 months ago

What steps do I need to take, if possible, to remove 2023TB-0071-3 and resume cleanly?

erinyoung commented 11 months ago

For that, you'll want to move the offending sample's fastq files from 'reads' and then use -resume. nextflow will then run with the files still in its cache and rerun all the processes not in its cache.

DrB-S commented 11 months ago

I added kraken, and resumed, but it is starting over....

DrB-S commented 11 months ago

Grandeur errored out again. I tried using the contigs files produced by spades, but I am still getting the following error: Oct-02 16:18:47.099 [Actor Thread 217] ERROR nextflow.extension.OperatorImpl - @unknown java.lang.NullPointerException: Cannot invoke method split() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:44) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at Script_2cc42974$_runScript_closure1$_closure2$_closure3.doCall(Script_2cc42974:19) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:137) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at jdk.internal.reflect.GeneratedMethodAccessor248.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

DrB-S commented 11 months ago

The size results indicate that there are three different taxa: M tuberculosis, M intracellulare, and M bovis: sample,genus,species,accession,size,datasets_size,expected_size,mash_size,quast_size 2023TB-0068-3,Mycobacterium,tuberculosis,refseq-NZ-1423567-PRJNA224116-SAMN02414983-GCF_000679675.1-.-Mycobacterium,4368301,,,7086710,4368301 2023TB-0069-3,Mycobacterium,tuberculosis,refseq-NC-395095-PRJNA224116-SAMN03081423-GCF_000153685.2-.-Mycobacterium,4370652,,,6601180,4370652 2023TB-0070-3,Mycobacterium,tuberculosis,refseq-NZ-1402592-PRJNA224116-SAMN02364085-GCF_000668035.1-.-Mycobacterium,4406453,,,5090990,4406453 2023TB-0072-3,Mycobacterium,tuberculosis,refseq-NZ-1423573-PRJNA224116-SAMN02414989-GCF_000679775.1-.-Mycobacterium,4365461,,,6699100,4365461 2023TB-0073-3,Mycobacterium,tuberculosis,refseq-NZ-1448837-PRJNA224116-SAMN02599183-GCF_000665805.1-.-Mycobacterium,4380167,,,6156410,4380167 2023TB-0074-3,Mycobacterium,tuberculosis,refseq-NZ-1408951-PRJNA224116-SAMN02381028-GCF_000668875.1-.-Mycobacterium,4399422,,,6601180,4399422 2023TB-0076-3,Mycobacterium,tuberculosis,refseq-NZ-1448455-PRJNA224116-SAMN02585950-GCF_000652275.1-.-Mycobacterium,4382335,,,6789620,4382335 2023TB-0077-3,Mycobacterium,tuberculosis,refseq-NZ-1448760-PRJNA224116-SAMN02599106-GCF_000664465.1-.-Mycobacterium,4395249,,,6947300,4395249 2023TB-0079-3,Mycobacterium,tuberculosis,refseq-NZ-1438827-PRJNA224116-SAMN02567766-GCF_000649715.1-.-Mycobacterium,4387629,,,6951250,4387629 2023TB-0080-3,Mycobacterium,tuberculosis,refseq-NZ-1324224-PRJNA224116-SAMN02053747-GCF_000660365.1-.-Mycobacterium,4363197,,,6074630,4363197 2023TB-0081-3,Mycobacterium,tuberculosis,refseq-NZ-1448723-PRJNA224116-SAMN02599069-GCF_000663785.1-.-Mycobacterium,4366878,,,6213830,4366878 2023TB-0082,Mycobacterium,tuberculosis,refseq-NZ-1427233-PRJNA224116-SAMN02419589-GCF_000680855.1-.-Mycobacterium,4364487,,,6570050,4364487 2023TB-0083,Mycobacterium,intracellulare,refseq-NZ-1335421-PRJNA224116-SAMN02641626-GCF_000524015.1-.-Mycobacterium,11036365,,,14271600,11036365 2023TB-0084,Mycobacterium,tuberculosis,refseq-NZ-1402592-PRJNA224116-SAMN02364085-GCF_000668035.1-.-Mycobacterium,6371223,,,13907100,6371223 2023TB-0085,Mycobacterium,tuberculosis,refseq-NZ-1427321-PRJNA224116-SAMN02419677-GCF_000656255.1-.-Mycobacterium,4676034,,,8920340,4676034 2023TB-0086,Mycobacterium,tuberculosis,refseq-NZ-1408939-PRJNA224116-SAMN02381016-GCF_000668655.1-.-Mycobacterium,4384713,,,5578010,4384713 2023TB-0087,Mycobacterium,tuberculosis,refseq-NZ-1402588-PRJNA224116-SAMN02364081-GCF_000667955.1-.-Mycobacterium,4365827,,,6639600,4365827 2023TB-0088,Mycobacterium,tuberculosis,refseq-NZ-1324233-PRJNA224116-SAMN02053756-GCF_000660545.1-.-Mycobacterium,4401301,,,5944600,4401301 2023TB-0089,Mycobacterium,tuberculosis,refseq-NZ-1417020-PRJNA224116-SAMN02398705-NZ_JLNH-.-Mycobacterium,4367024,,,6364970,4367024 2023TB-0090,Mycobacterium,tuberculosis,refseq-NC-395095-PRJNA224116-SAMN03081423-GCF_000153685.2-.-Mycobacterium,4369835,,,6061060,4369835 2023TB-0091,Mycobacterium,tuberculosis,refseq-NZ-1354127-PRJNA224116-SAMN02231157-GCF_000667125.1-.-Mycobacterium,4381615,,,6387860,4381615 2023TB-0092,Mycobacterium,tuberculosis,refseq-NZ-1267360-PRJNA224116-SAMN01828247-GCF_000659105.1-.-Mycobacterium,4603507,,,7539900,4603507 2023TB-0094-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4299915,,,6865320,4299915 2023TB-0095-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4301140,,,5935100,4301140 2023TB-0096-3,Mycobacterium,tuberculosis,refseq-NZ-1773-PRJNA224116-SAMN02673326-GCF_000666085.1-.-Mycobacterium,4371550,,,7073920,4371550 2023TB-0097-3,Mycobacterium,tuberculosis,refseq-NZ-1354151-PRJNA224116-SAMN02231121-GCF_000666525.1-.-Mycobacterium,4377350,,,7000040,4377350 2023TB-0098-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4301726,,,6738720,4301726 2023TB-0100-3,Mycobacterium,tuberculosis,refseq-NZ-1423432-PRJNA224116-SAMN02414848-GCF_000677255.1-.-Mycobacterium,4300012,,,6684950,4300012 2023TB-0113,Mycobacterium,bovis,refseq-NZ-1765-PRJNA224116-SAMN03288261-GCF_000934325.1-.-Mycobacterium,4321907,,,5277580,4321907 2023TB-0114,Mycobacterium,bovis,refseq-NZ-1765-PRJNA224116-SAMN03290670-GCF_000878485.1-.-Mycobacterium,4330876,,,6229390,4330876 2023TB-0115,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4318315,,,6608810,4318315 2023TB-0116,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4324909,,,6352530,4324909 2023TB-0117,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4318196,,,5947660,4318196 2023TB-0122,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4325864,,,6406550,4325864 2023TB-0123,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4322108,,,5698950,4322108 2023TB-0124,Mycobacterium,bovis,refseq-NC-233413-PRJNA57695-.-.-.-Mycobacterium_bovis,4314259,,,6453350,4314259

erinyoung commented 11 months ago

Grandeur errored out again. I tried using the contigs files produced by spades, but I am still getting the following error: Oct-02 16:18:47.099 [Actor Thread 217] ERROR nextflow.extension.OperatorImpl - @unknown java.lang.NullPointerException: Cannot invoke method split() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:44) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at Script_2cc42974$_runScript_closure1$_closure2$_closure3.doCall(Script_2cc42974:19) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:137) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at jdk.internal.reflect.GeneratedMethodAccessor248.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at groovy.lang.Closure.call(Closure.java:412) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

This is a nextflow error. What command did you use to put files through Grandeur?

DrB-S commented 11 months ago

For the fastqs: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --reads reads --outgroup NC_000962.3.fna For the contigs: nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --fastas fastas --outgroup NC_000962.3.fna

erinyoung commented 11 months ago

Is

nextflow run UPHL-BioNGS/Grandeur -profile singularity,msa --medcpus 50 --maxcpus 90 --fastas fastas --outgroup NC_000962.3.fna

being run in the same directory that has reads in it?

If so, it's grabbing all the fastq files, too.

DrB-S commented 11 months ago

No. There are no reads in the reads dir when running the fastas. And vice-versa.

erinyoung commented 11 months ago

Could you share the full error message?

erinyoung commented 11 months ago

Also, if you think you have two different types of mycobacterium (i.e. Mycobacterium tuberculosis and Mycobacterium bovis), you're going to want to just focus on one.

Grandeur isn't really intended to be used for genomic comparisons, but rather suspected clonal expansions (aka outbreaks).

Still, it can be used for any rational core-genome comparison, and you can lower the number of genes required for iqtree2 by setting the roary_min_genes paramater to something lower (the default is params.roary_min_genes = 1500).

DrB-S commented 11 months ago

Here are the 2 nextflow logs.
nextflow_fastqs.log.gz nextflow_contigs.log.gz

erinyoung commented 11 months ago

Do you have any files generated in grandeur/fastani ? (or whatever you chose your outdir to be)

DrB-S commented 11 months ago

Yes, but fastani_summary.csv is blank except for the header.

erinyoung commented 11 months ago

I think that might have caused the error. I didn't test that. Just a second while I get together the commands for a workaround.

erinyoung commented 11 months ago

Alright, this is a bug that I need to fix in the future, but there is a current workaround by adding in a file/files from ncbi to use as a fastani reference.

Workaround 1 : set the current_datasets parameter to true, this will use ncbi datasets to download representative genomes of organisms identified. This sometimes has runtime issues in cloud environments, however.

nextflow run UPHL-BioNGS/Grandeur <what you normally put here> -resume --current_datasets true

Workaround 2 : add relevant genomes for fastani

This involves downloading a fasta file from ncbi and putting it where you can access it locally. Here's a webpage for Mycobacterium tuberculosis : https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=1773

The reference genome for this can be found using the following command:

curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000195955.2/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT&filename=GCF_000195955.2.zip" -H "Accept: application/zip"

Then you will need to unzip the downloaded file

unzip GCF_000195955.2.zip

The fasta file will be located at ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna

You can then use this fasta file in fastani with the following command:

nextflow run UPHL-BioNGS/Grandeur <what you normally put here> -resume ----fastani_ref ncbi_dataset/data/GCF_000195955.2/GCF_000195955.2_ASM19595v2_genomic.fna

More information for how to add fastani references can be found here : https://github.com/UPHL-BioNGS/Grandeur/wiki/fastani

Let me know if that fixes your issues!!!

DrB-S commented 11 months ago

Workaround 1 worked. Thanks!