Closed SarahChen0401 closed 2 years ago
Hi @SarahChen0401 , I'm not sure what was stucked there. Maybe it is because of the code of ccsmeth, maybe it is because of the other processes running in this machine. I suggest you re-submit this job and see what happens.
Also, how many lines does your input (./1_extract_res/chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.tsv) have? Do you have a GPU in this machine? If you do, can you show me the nvidia-smi
output?
Best, Peng
Hi Peng, there are 15620865 lines in my input (./1_extract_res/chr22.CKCG-00309-CLR.clr.aln.bam.pbmm2.features.zscore.fb.depth1.tsv).
[chenshuhua@login04 1_extract_res]$ wc -l chr22.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.tsv
15620865 chr22.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.tsv
I do have a GPU on my machine. Here is the nvidia-smi
output:
[chenshuhua@gvno02 ~]$ nvidia-smi
Mon Apr 18 13:38:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:1B:00.0 Off | 0 |
| N/A 29C P0 52W / 300W | 3741MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:1C:00.0 Off | 0 |
| N/A 26C P0 52W / 300W | 1874MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:60:00.0 Off | 0 |
| N/A 28C P0 54W / 300W | 3745MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 |
| N/A 27C P0 51W / 300W | 3741MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... Off | 00000000:B1:00.0 Off | 0 |
| N/A 25C P0 51W / 300W | 3741MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... Off | 00000000:B2:00.0 Off | 0 |
| N/A 28C P0 51W / 300W | 3741MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... Off | 00000000:DA:00.0 Off | 0 |
| N/A 24C P0 52W / 300W | 3745MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... Off | 00000000:DC:00.0 Off | 0 |
| N/A 26C P0 53W / 300W | 0MiB / 32510MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3878703 C ...9_python3venv/bin/python3 1869MiB |
| 0 N/A N/A 3878706 C ...9_python3venv/bin/python3 1869MiB |
| 1 N/A N/A 3878711 C ...9_python3venv/bin/python3 1871MiB |
| 2 N/A N/A 3275195 C ...9_python3venv/bin/python3 1869MiB |
| 2 N/A N/A 3275198 C ...9_python3venv/bin/python3 1871MiB |
| 3 N/A N/A 3166020 C ...9_python3venv/bin/python3 1869MiB |
| 3 N/A N/A 3166021 C ...9_python3venv/bin/python3 1869MiB |
| 4 N/A N/A 4119752 C ...9_python3venv/bin/python3 1869MiB |
| 4 N/A N/A 4119753 C ...9_python3venv/bin/python3 1869MiB |
| 5 N/A N/A 3275201 C ...9_python3venv/bin/python3 1869MiB |
| 5 N/A N/A 3275202 C ...9_python3venv/bin/python3 1869MiB |
| 6 N/A N/A 3878704 C ...9_python3venv/bin/python3 1871MiB |
| 6 N/A N/A 3878707 C ...9_python3venv/bin/python3 1869MiB |
+-----------------------------------------------------------------------------+
As you mentioned, I submitted 22 chromosomes + X,Y,M chromosomes to run ccsmeth call_mods on several gpu nodes. they were running more than 6 days,
[chenshuhua@login04 ~]$ squeue | grep chenshu | grep ccsmeth3
13566908_19 rt-2080ti ccsmeth3 chenshuh R 6-06:23:16 1 grtq16
13566908_20 rt-2080ti ccsmeth3 chenshuh R 6-06:23:16 1 grtq16
13566908_21 rt-2080ti ccsmeth3 chenshuh R 6-06:23:16 1 grtq16
13566908_23 rt-2080ti ccsmeth3 chenshuh R 6-06:23:16 1 grtq16
13566908_17 rt-2080ti ccsmeth3 chenshuh R 8-21:36:56 1 grtq14
13566908_18 rt-2080ti ccsmeth3 chenshuh R 8-21:36:56 1 grtq14
13566893_1 rt-2080ti ccsmeth3 chenshuh R 10-02:59:41 1 grtq14
13566893_2 rt-2080ti ccsmeth3 chenshuh R 10-02:59:41 1 grtq14
13566893_3 rt-2080ti ccsmeth3 chenshuh R 10-02:59:41 1 grtq15
13566893_4 rt-2080ti ccsmeth3 chenshuh R 10-02:59:41 1 grtq15
13566893_5 rt-2080ti ccsmeth3 chenshuh R 10-02:59:41 1 grtq15
13566908_16 v100-oct ccsmeth3 chenshuh R 8-22:22:33 1 gvno02
13566908_13 v100-oct ccsmeth3 chenshuh R 9-06:20:59 1 gvno02
13566908_14 v100-oct ccsmeth3 chenshuh R 9-06:20:59 1 gvno02
13566908_15 v100-oct ccsmeth3 chenshuh R 9-06:20:59 1 gvno02
13566908_9 v100-oct ccsmeth3 chenshuh R 9-11:48:30 1 gvno01
13566908_10 v100-oct ccsmeth3 chenshuh R 9-11:48:30 1 gvno01
13566908_11 v100-oct ccsmeth3 chenshuh R 9-11:48:30 1 gvno01
13566908_12 v100-oct ccsmeth3 chenshuh R 9-11:48:30 1 gvno01
13566908_7 v100-oct ccsmeth3 chenshuh R 9-23:38:22 1 gvno02
13566908_8 v100-oct ccsmeth3 chenshuh R 9-23:38:22 1 gvno02
13566893_6 v100-oct ccsmeth3 chenshuh R 10-02:59:31 1 gvno02
these showed time that results updated last time:
[chenshuhua@login04 2_prediction_res]$ ls -lhrt
total 5.7G
-rw-r--r-- 1 chenshuhua yangjian 51M Apr 8 10:27 chr3.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 112M Apr 8 10:29 chr5.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 291M Apr 8 10:30 chr6.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 364M Apr 8 10:32 chr1.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 425M Apr 8 10:33 chr4.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 352M Apr 8 10:36 chr2.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 1.2G Apr 8 11:48 1_chr22.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 350M Apr 8 13:52 chr7.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 351M Apr 8 13:53 chr8.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 25M Apr 9 01:38 chr12.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 32M Apr 9 01:39 chr10.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 30M Apr 9 01:39 chr11.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 70M Apr 9 01:41 chr9.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 286M Apr 9 07:10 chr15.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 436M Apr 9 07:11 chr13.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 432M Apr 9 07:14 chr14.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 332M Apr 9 15:09 chr16.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 45M Apr 9 15:51 chr18.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 390M Apr 9 15:57 chr17.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 48M Apr 12 07:04 chr19.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 43M Apr 12 07:04 chr20.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 45M Apr 12 07:05 chr21.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 47M Apr 12 07:05 chrX.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 126M Apr 13 15:32 chrY.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
-rw-r--r-- 1 chenshuhua yangjian 1.6M Apr 13 15:33 chrM.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv
it seems the result stuck at different positions in different chromosome:
[chenshuhua@login04 2_prediction_res]$ tail -n1 *
==> 1_chr22.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr22 50424885 + m64200e_210430_130607/56165855 7 0.132108 0.867892 1 CCCGC
==> chr10.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr10 799548 + m64189e_210427_070436/22413697 3 0.358548 0.641452 1 CACGC
==> chr11.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr11 1374051 + m64189e_210427_070436/48497010 1 0.035052 0.964948 1 TCCGG
==> chr12.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr12 852764 + m64189e_210427_070436/135596881 5 0.677823 0.322177 0 AACGA
==> chr13.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr13 39428504 + m64189e_210427_070436/30411209 3 0.141472 0.858528 1 CCCGT
==> chr14.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr14 43990296 + m64189e_210427_070436/123994691 5 0.682539 0.317461 0 AACGA
==> chr15.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr15 40283915 + m64200e_210430_130607/20578596 2 0.146907 0.853093 1 GCCGC
==> chr16.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr16 8718054 + m64189e_210427_070436/139068357 3 0.029902 0.970098 1 GCCGT
==> chr17.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr17 16570842 + m64189e_210427_070436/55575669 1 0.893883 0.106117 0 CACGA
==> chr18.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr18 2788336 + m64189e_210427_070436/169806003 3 0.219353 0.780647 1 ATCGT
==> chr19.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr19 1469032 + m64200e_210430_130607/28967102 6 0.997378 0.002622 0 AACGC
==> chr1.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr1 10391021 + m64189e_210427_070436/168297676 5 0.56864 0.43136 0 CACGT
==> chr20.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr20 838557 + m64200e_210430_130607/22611007 5 0.400747 0.599253 1 CTCGG
==> chr21.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr21 9033780 + m64200e_210430_130607/50398251 5 0.31292 0.68708 1 CCCGG
==> chr2.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr2 15008617 + m64189e_210427_070436/41617546 7 0.004072 0.995928 1 ATCGG
==> chr3.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr3 3194675 + m64200e_210430_130607/59704157 2 0.664408 0.335592 0 AACGC
==> chr4.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr4 14500250 + m64189e_210427_070436/141164661 7 0.000319 0.999681 1 ATCGA
==> chr5.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr5 4495723 + m64189e_210427_070436/114164665 3 0.926066 0.073934 0 CTCGT
==> chr6.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr6 12452371 + m64189e_210427_070436/57606513 1 0.190843 0.809157 1 AGCGA
==> chr7.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr7 11049452 + m64200e_210430_130607/88408785 1 0.354293 0.645707 1 GGCGA
==> chr8.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr8 17797788 + m64189e_210427_070436/128779112 3 0.017365 0.982635 1 TACGC
==> chr9.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chr9 4107563 + m64200e_210430_130607/111149217 2 0.08882 0.91118 1 GCCGC
==> chrM.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chrM 16541 + m64189e_210427_070436/91620274 4 0.511339 0.488661 0 CACGT
==> chrX.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chrX 1673712 + m64200e_210430_130607/4589605 3 0.406132 0.593868 1 CCCGG
==> chrY.CKCG-00309-HiFi.hifi.subreads.bam.pbmm2.features.zscore.fb.depth1.call_mods.tsv <==
chrY 56886943 + m64189e_210427_070436/38798596 5 0.657119 0.342881 0 CACGT
Oh, I run over chr22 for one time, but I can't not run over it again after that time, and also can not run through on other chromosomes.
Best wishes, Shuhua
@SarahChen0401 , thanks for clarifying these. Everything seems fine and ccsmeth should work in your case. One possible reason is that you submitted too many jobs at the same time, but this machine doesn't have enough resources so every job was stucked.
It seems you applied 16 processors/threads for each job. So How many processors does your machine have? And how many jobs did you submit at the same time?
Best, Peng
@SarahChen0401 , thanks for clarifying these. Everything seems fine and ccsmeth should work in your case. One possible reason is that you submitted too many jobs at the same time, but this machine doesn't have enough resources so every job was stucked.
It seems you applied 16 processors/threads for each job. So How many processors does your machine have? And how many jobs did you submit at the same time?
Best, Peng
Thanks Peng. I applied 2 threads for each job, it works!
Best wishes, Shuhua
Dear Peng, I run the 'ccsmeth call_mods' command and got these results:
i can get some results from chr22:
But it seems like the result just stopped at chr22:17982584 for more than 10 hours.
I check the software status it is "S" meaning "stopping".
I was wondering if the problem is from my data, my GPU, or your software?