Tuteja-Lab / UnsupervisedPeakCaller

Peak calling for ATAC-seq data using contrastive learning on biological replicates
8 stars 3 forks source link

main.py fails #8

Open francicco opened 2 months ago

francicco commented 2 months ago

Hi,

I'm testing your amazing method on my data after running it on the toy example you provide.

But this is the error I get:

main.py  --epochs 25 --batch_size 256 --datapath . --n_rep 6 --modelpath ./rcl.ckpt
Reading coverage files
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/main.py", line 269, in <module>
    cov = read_data_new(file)
          ^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/ios.py", line 90, in read_data_new
    return np.array(dat)
           ^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (99357,) + inhomogeneous part.

Any help? Thanks a lot F

hhvu0102 commented 2 months ago

Hello Francicco,

Thank you for your interest in our tool. It looks like the input files are not of the expected format, or there are too many files in the current directory that you are working on. Would you mind providing the following details:

  1. Did you run bash /path/to/preprocessing.bash -d <directory> -b <bam-file> -t <number of threads> -n <name> and obtain a file with similar format as our example file MCF7_chr10_rep*.covBga.txt? If yes, can you post a snippet of the input files?
  2. Did the file names coming from different replicates get numbered correctly, such as rep1, rep2, etc.?

It's also helpful sometimes to provide absolute path instead of ..

If possible, you can send me a file of one chromosome (for example, chr10), so I can help debugging.

Looking forward to hearing from you.

francicco commented 2 months ago

Hi Vu,

Thanks a lot for your quick reply. I've now run the preprocessing again with just 2 reps, I have 6 in total.

This is the command I executed: preprocessing.bash -d $PWD -b "LB43.PP.DupRem.bam LB44.PP.DupRem.bam" -t 12 -n PP.Hmel

I've got a few warning for scaffolds with no regions, something expected. But then at the end:

[STAGE 5] (/user/work/tk19812/software/UnsupervisedPeakCaller/preprocessing.bash:488) Extracting initial 1000 bp segments from candidate regions (regions3)....
[STAGE 5] (/user/work/tk19812/software/UnsupervisedPeakCaller/preprocessing.bash:519) Writing initial 1000-length segments to file "/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/chr*/PP.Hmel_Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem_*-regions3.txt"
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 817 times. Isn't it about time 
you run 'parallel --citation' once to silence the citation notice?

Error in serverSocket(port = port) : 
  creation of server socket failed: port 11044 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11351 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11022 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11980 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
Error in serverSocket(port = port) : 
  creation of server socket failed: port 11056 cannot be opened
Calls: makeCluster -> makeForkCluster -> serverSocket
Execution halted
[ERROR] Failed to write "/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/chrHmel200083o/PP.Hmel_Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem_Hmel200083o-regions3.txt" regions3 file.

I then rerun it using -g Hmel because this is a butterfly. It went through STAGE 7 with some errors in small scaffold with negative starting positions:

Hmel200006o -381    619 Hmel200006osegment1

It ended:

Candidate regions are in file:
    PP.Hmel_Hmel.ATACseq.LB43.PP_bigInputs.txt

Then I run: run_rcl.sh -d $PWD -w -b "LB43.PP.DupRem.bam LB44.PP.DupRem.bam"

And main.py fails:

main.py  --epochs 25 --batch_size 256 --datapath 'mypath' --n_rep 2 --modelpath 'mypath'/rcl.ckpt
Finish training, start writing results (if your data is large, please give more memory)
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/rcl_score.py", line 133, in <module>
    classification = Classify(str(args.model))    ## train on all (80 or 90 %)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/rcl_score.py", line 25, in __init__
    base_model = ContrastLearn.load_from_checkpoint(embeddings_model_path).model
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/utilities/model_helpers.py", line 125, in wrapper
    return self.method(cls, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/core/module.py", line 1582, in load_from_checkpoint
    loaded = _load_from_checkpoint(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/pytorch_lightning/core/saving.py", line 63, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/lightning_fabric/utilities/cloud_io.py", line 59, in _load
    with fs.open(path_or_url, "rb") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/spec.py", line 1303, in open
    f = self._open(
        ^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 191, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 355, in __init__
    self._open()
  File "/user/home/tk19812/.local/lib/python3.12/site-packages/fsspec/implementations/local.py", line 360, in _open
    self.f = open(self.path, mode=self.mode)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rcl.ckpt'

The *rep1.covBga.txt files look like:

==> Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.covBga.txt <==
Hmel221001o 20131   20132   Hmel221001osegment1 33
Hmel221001o 20132   20139   Hmel221001osegment1 36
Hmel221001o 20139   20141   Hmel221001osegment1 34
Hmel221001o 20141   20142   Hmel221001osegment1 30
Hmel221001o 20142   20144   Hmel221001osegment1 31
Hmel221001o 20144   20156   Hmel221001osegment1 32
Hmel221001o 20156   20168   Hmel221001osegment1 33
Hmel221001o 20168   20177   Hmel221001osegment1 35
Hmel221001o 20177   20178   Hmel221001osegment1 33
Hmel221001o 20178   20179   Hmel221001osegment1 32

==> Hmel.ATACseq.LB44.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.covBga.txt <==
Hmel221001o 20131   20138   Hmel221001osegment1 9
Hmel221001o 20138   20139   Hmel221001osegment1 10
Hmel221001o 20139   20168   Hmel221001osegment1 9
Hmel221001o 20168   20179   Hmel221001osegment1 10
Hmel221001o 20179   20196   Hmel221001osegment1 9
Hmel221001o 20196   20198   Hmel221001osegment1 10
Hmel221001o 20198   20250   Hmel221001osegment1 9
Hmel221001o 20250   20320   Hmel221001osegment1 10
Hmel221001o 20320   20323   Hmel221001osegment1 12
Hmel221001o 20323   20325   Hmel221001osegment1 56

Did the file names coming from different replicates get numbered correctly, such as rep1, rep2, etc.?

I think so. I've now changed to a full path.

The files are too big, I've made a link to it. https://drive.google.com/file/d/1DyKIxm2m2BNSqc3GzfcZ06xBQafnL8Vw/view?usp=sharing

Thanks a lot F

hhvu0102 commented 2 months ago

Hi Francesco, Thank you very much for the details.

  1. The error

Error in serverSocket(port = port) : creation of server socket failed: port 11044 cannot be opened Calls: makeCluster -> makeForkCluster -> serverSocket Execution halted Error in serverSocket(port = port) : creation of server socket failed: port 11351 cannot be opened Calls: makeCluster -> makeForkCluster -> serverSocket Execution halted Error in serverSocket(port = port) : creation of server socket failed: port 11022 cannot be opened Calls: makeCluster -> makeForkCluster -> serverSocket Execution halted Error in serverSocket(port = port) : creation of server socket failed: port 11980 cannot be opened Calls: makeCluster -> makeForkCluster -> serverSocket Execution halted Error in serverSocket(port = port) : creation of server socket failed: port 11056 cannot be opened Calls: makeCluster -> makeForkCluster -> serverSocket Execution halted

means that not all of the cores you supplied were available at the time of process. Our code in preprocessing.bash assumes the number of threads in -t can also be used as the number of cores for some internal parallel processing. This also explains why the next time you ran the code, it went through successfully - it's because the second time there were 12 cores available to run. I will update the code so that it stops this behavior. In the meantime, you may consider reducing the number of threads, but it may take a bit longer to process.

  1. About negative starting positions:

Hmel200006o -381 619 Hmel200006osegment1

I apologize for this error. We have some sanity check step in the code for this and it worked ok for mouse and human, but maybe it failed for non-mouse/human chromosome names. I will work on an update soon.

  1. About main.py failing: This output message Finish training, start writing results (if your data is large, please give more memory) means that main.py may have run successfully. Would you mind confirming if /user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rcl.ckpt actually exist?

  2. I'll send a request to access the data and try working on it next week.

Thank you for your patience!

hhvu0102 commented 2 months ago

Hi Francesco,

I have not completely resolved all of the bugs, but I'd like to follow up with what I have so far. Unfortunately, I could not replicate your error. With your data, I have successfully trained a model and predicted peaks. Here's what I did:

cd /home/vthihong/tools/UnsupervisedPeakCaller #here's where I store RCL code, i.e., where one would clone the code into

source  ~/1_denoisedATAC/envRCL/bin/activate #activate RCL environment
bash run_rcl.sh -d "/home/vthihong/testTools/0_testRCL/chrHmel221001o/" -b "Hmel.ATACseq.LB43.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.bam Hmel.ATACseq.LB44.PP.TrimGalore.BWA.SortedByCoords.filtered.DupRem.bam" -r "" -g -s -w
deactivate

Under the directory /home/vthihong/testTools/0_testRCL/chrHmel221001o/, I stored your data in a subdirectory called Hmel221001o which corresponds to the chromosome you sent.

Upon inspecting the code using your data, it appears to me that one region is not 1000bp long (Hmel221001osegment4306), which indicates there's some minor bug in the upstream steps that I will work on soon. This problem should not affect if the model is trained or not, but will affect the rcl_score.py step. In order to successfully run rcl_score.py, I simply removed Hmel221001osegment4306 from the covBga.txt files, but this is not a long term solution.

Here are my suggestions for now:

  1. Check if rcl.ckpt actually exists in the $PWD directory.
  2. If rcl.ckpt does not exist, I assume there should be a file called out.err in the $PWD directory. We'll need to look into that output to debug.

Thanks, Ha

francicco commented 2 months ago

Hi @hhvu0102,

Thanks a lot! I'll give it another go and I'll let you know! Cheers F

francicco commented 2 months ago

Another thing I noticed.

I hade to change a line in conclu.py.

From from scipy.signal import gaussian to from scipy.signal.windows import gaussian

what version of python/scipy is better using?

Cheers F

hhvu0102 commented 2 months ago

Hi,

I'm using Python 3.7.10 for RCL. Here are other packages I have in the environment:

Package                  Version
------------------------ ------------
-orch                    1.10.0
aiohttp                  3.8.6
aiosignal                1.3.1
annotated-types          0.5.0
anyio                    3.7.1
arrow                    1.2.3
async-timeout            4.0.3
asynctest                0.13.0
attrs                    23.1.0
beautifulsoup4           4.12.2
blessed                  1.20.0
boto3                    1.29.4
botocore                 1.32.4
certifi                  2023.11.17
charset-normalizer       3.3.2
click                    8.1.7
croniter                 1.3.15
cycler                   0.11.0
dateutils                0.6.12
deepdiff                 6.7.1
exceptiongroup           1.1.3
fastapi                  0.88.0
fonttools                4.38.0
frozenlist               1.3.3
fsspec                   2023.1.0
h11                      0.14.0
idna                     3.4
importlib-metadata       6.7.0
inquirer                 2.10.1
itsdangerous             2.1.2
Jinja2                   3.1.2
jmespath                 1.0.1
joblib                   1.3.2
kiwisolver               1.4.5
lightning                1.9.5
lightning-cloud          0.5.55
lightning-utilities      0.10.0
markdown-it-py           2.2.0
MarkupSafe               2.1.3
matplotlib               3.5.3
mdurl                    0.1.2
multidict                6.0.4
numpy                    1.21.6
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
ordered-set              4.1.0
packaging                23.2
pandas                   1.3.5
Pillow                   9.5.0
pip                      20.1.1
psutil                   5.9.6
pydantic                 2.5.1
pydantic-core            2.14.3
pygments                 2.17.1
PyJWT                    2.8.0
pyparsing                3.1.1
python-dateutil          2.8.2
python-editor            1.0.4
python-multipart         0.0.6
pytorch-lightning        1.9.5
pytz                     2023.3.post1
PyYAML                   6.0.1
readchar                 4.0.5
requests                 2.31.0
rich                     13.7.0
s3transfer               0.7.0
scikit-learn             1.0.2
scipy                    1.7.3
seaborn                  0.12.2
setuptools               47.1.0
six                      1.16.0
sniffio                  1.3.0
soupsieve                2.4.1
starlette                0.29.0
starsessions             1.3.0
threadpoolctl            3.1.0
torch                    1.10.0
torchaudio               0.10.0
torchmetrics             0.11.4
torchvision              0.11.0
tqdm                     4.66.1
traitlets                5.9.0
typing-extensions        4.7.1
urllib3                  2.0.7
uvicorn                  0.22.0
wcwidth                  0.2.10
websocket-client         1.6.1
websockets               11.0.3
wheel                    0.41.3
yarl                     1.9.3
zipp                     3.15.0
francicco commented 2 months ago

Hi @hhvu0102,

I reinstalled everything, re-run the example, and it worked but still I'm not able to generate the model:

/user/home/tk19812/.local/lib/python3.12/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
/user/home/tk19812/.local/lib/python3.12/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Reading coverage files
Reading RCL input file /user/work/tk19812/HeliconiniiProject/ATACseqProjectTrimmed/Hmel/rep1.txt.
Traceback (most recent call last):
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/main.py", line 267, in <module>
    cov = read_data_new(file)
          ^^^^^^^^^^^^^^^^^^^
  File "/user/work/tk19812/software/UnsupervisedPeakCaller/ios.py", line 90, in read_data_new
    return np.array(dat)
           ^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100584,) + inhomogeneous part.

:(

F