AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
58 stars 28 forks source link

mosek.Error: (1001) The license has expired. #67

Open jingydz opened 2 months ago

jingydz commented 2 months ago

I have successfully run the AA program on hundreds of samples in the past, but when I tried to run it again after a long time, I encountered this error: [MOSEK:ERROR] Error when using MOSEK: (1001) The license has expired. Do I need to obtain an updated license?

Additionally, I am unsure what changes were made to my environment, as I had to reconfigure it for a long time. Now, when running each sample, I get the following warning: /xxx/software/cnvkit/cnvlib/coverage.py:173: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[0.4892756 0.42481279 0.14973695 ... 0. 0. 0.]' has dtype incompatible with int64, please explicitly cast to a compatible dtype first. table.loc[ok_idx, 'depth'] = (table.loc[ok_idx, 'basecount']

It seems there is a data type incompatibility issue at line 173 of the coverage.py file. Is it due to an incorrect version of a package? Although this warning appears, the program still continues to run, and all stages seem to complete successfully. I also get the file:

$ cat sample/sample_AA_out/sample_finish_flag.txt All stages completed

Is the result still reliable?

jluebeck commented 2 months ago

Hi,

To resolve the MOSEK license issue, please obtain an updated copy of the license and replace the old license file with the new one.

I am not fully certain about the issue with your CNVkit installation. Can I ask how you installed CNVkit and what version of python you are running it with? It just seems to be a deprecation warning which is harmless for now but indicates that altering or updating your python version or package versions may cause an issue. As long as the CNV_CALLS.bed file for your sample in the cnvkit_output directory appears okay then I would say that it completed without issue.

Thanks, Jens

jingydz commented 2 months ago

It appears that the result file that you referred to is without any issues.

$ head -n 10 ./sample_AA_out/sample_cnvkit_output/sample_CNV_CALLS.bed chr1 14941 44591 CNVkit 2.451371265417587 chr1 44591 138482 CNVkit 3.9670978979776517 chr1 138482 357863 CNVkit 1.935956182879508 chr1 357863 412290 CNVkit 1.416201942148882 chr1 412290 491456 CNVkit 1.0776181089191303 chr1 491456 535988 CNVkit 2.0746776539637786 chr1 585988 701085 CNVkit 1.2416581247938505 chr1 701085 1676912 CNVkit 1.8582337604322074 chr1 1676912 1746971 CNVkit 1.0195015720770932 chr1 1746971 2647734 CNVkit 1.930494842515606

I'm curious that out of approximately five hundred samples, all of which are tumor samples, only five samples were found to contain ecDNA, and each sample had just one ecDNA. This does not align with my understanding that "ecDNA is often present in tumors." Did I miss some steps?

time /xxx/miniconda3/bin/python3 /xxx/AmpliconSuite-pipeline/PrepareAA.py \ -s sample -t 4 \ --cnvkit_dir /xxx/cnvkit/cnvkit.py --bam sample.bam \ --ref GRCh38 -o sample_AA_out \ --run_AA --run_AC

jluebeck commented 2 months ago

Hi,

I don't see anything problematic about your command or CNV preview.

One thing to keep in mind is that ecDNA frequency varies dramatically by cancer type. Some types show very little ecDNA. Can I ask what kind of cancer those 500 tumor samples represent?

Another variable to keep in mind is that ecDNA detection with AA will be affected by sample purity to a certain degree. Low purity samples (particularly under 40% or so) will be much more difficult to detect ecDNA in.

Thanks, Jens

jingydz commented 2 months ago

Is this related to the fact that I am using blood samples, not cancerous tissue?

jluebeck commented 2 months ago

That can definitely be a factor. Are these blood cancer samples or are you using blood samples from patients with solid tumors? If blood cancer samples, how were the cells of interest isolated?

jingydz commented 2 months ago

I used blood samples from patients with solid tumors, specifically cells extracted from the separated white membrane layer, to isolate their DNA. I noticed that the proportion of ecDNA in TCGA is also relatively low, probably below 10%, but TCGA likely uses tumor tissue, which is somatic cell variation, and ecDNA is related to tumor progression. Therefore, it might be normal for the proportion of ecDNA in my blood samples to be below 1%?

In addition, I am puzzled that I used control samples from different individuals, such as SGDP and HGDP, and several other cohorts. I found that the proportion of ecDNA in SGDP is as high as 21%, but the SGDP population is clearly a natural population, and the proportion of ecDNA should be even less. Is it because the AA program is not suitable for germline detection?

jluebeck commented 2 months ago

I would expect the ecDNA detection rate to be far lower in peripheral blood than the solid tumor. I am not at all surprised that you find <1% ecDNA rate in the peripheral blood of the solid tumor.

Regarding SGDP and HGDP - collections of normal samples. Something is clearly wrong because we and others who use our tool do not see ecDNA in normal tissues (exceptionally low positive detection rate). Keep in mind that some of those sequencing files in SGDP/HGDP may be whole exome, which could cause the positives you are finding.