ghm17 / LOGODetect

LOGODetect is a powerful tool to identify small segments that harbor local genetic correlation between two traits/diseases.
GNU General Public License v3.0
19 stars 5 forks source link

LDSC Component does not run - can this be overridden? #18

Closed nrs225 closed 1 year ago

nrs225 commented 1 year ago

Thank you for developing LOGODetect. I have a working copy of LDSC which runs as a module that I load on my HPC cluster. When I try to run the tutorial example, I set the flag --ldsc_dir to the directory where the cluster copy of LDSC is based, but then receive the following error messages:

Extracting number of samples and rownames from 1000G_EUR_QC.fam...
Extracting number of variants and colnames from 1000G_EUR_QC.bim...
  File "/mnt/storage/apps/eb/software/ldsc/1.0.1-GCCcore-11.2.0/bin/munge_sumstats.py", line 583
    if args.daner_n:
TabError: inconsistent use of tabs and spaces in indentation
  File "/mnt/storage/apps/eb/software/ldsc/1.0.1-GCCcore-11.2.0/bin/munge_sumstats.py", line 583
    if args.daner_n:
TabError: inconsistent use of tabs and spaces in indentation
  File "/mnt/storage/apps/eb/software/ldsc/1.0.1-GCCcore-11.2.0/bin/ldsc.py", line 84
    print msg
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?
Error in file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") : 
  cannot open the connection
In addition: Warning message:
In file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") :
  cannot open file '/mnt/storage/nobackup/nrs225/Sims/LOGODetect/LOGODetect_data/results/tmp_files/ldsc/ldsc_rg.log': No such file or directory
Execution halted

It appears LDSC doesn't work when run via LOGODetect. I don't have any problems running LDSC normally, so could I run this on its own and then feed in the relevant LDSC output file into LOGODetect instead? Also, which variables does LOGODetect need from LDSC output?

Many thanks!

ghm17 commented 1 year ago

It may be the inconsistent version of LDSC that cause the problem. LOGODetect runs LDSC two times, the first is to calculate the heritability for two traits (it would be straightforward to feed the estimates to LOGODetect in this stage), the second is to perform stratified genetic covariane analysis. It would be much simpler to set the flag --ldsc_dir to the LDSC directory provided by LOGODetect and see if it can work first. If not, I can try to separate the stratified genetic covariance analysis from LOGODetect software then.

nrs225 commented 1 year ago

Thank you for responding. I get exactly the same error message when I use the LDSC directory provided as part of the LOGODetect installation. I can remove the error: TabError: inconsistent use of tabs and spaces in indentation by manually editing all tabs to spaces in your provided _mungesumstats.py script, but am then left with a new set of error messages that I cannot resolve myself.

This is the new error message I receive when I run LOGODetect with the flag --ldsc_dir set to your provided LDSC directory (after editing munge_sumstats.py):

Extracting number of samples and rownames from 1000G_EUR_QC.fam...
Extracting number of variants and colnames from 1000G_EUR_QC.bim...
Traceback (most recent call last):
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/munge_sumstats.py", line 12, in <module>
    from ldscore import sumstats
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/ldscore/sumstats.py", line 13, in <module>
    import parse as ps
ModuleNotFoundError: No module named 'parse'
Traceback (most recent call last):
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/munge_sumstats.py", line 12, in <module>
    from ldscore import sumstats
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/ldscore/sumstats.py", line 13, in <module>
    import parse as ps
ModuleNotFoundError: No module named 'parse'
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/ldsc.py", line 84
    print msg
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?
Error in file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") : 
  cannot open the connection
In addition: Warning message:
In file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") :
  cannot open file '/mnt/storage/nobackup/nrs225/Sims/LOGODetect/LOGODetect_data/results/tmp_files/ldsc/ldsc_rg.log': No such file or directory
Execution halted

Thanks

ghm17 commented 1 year ago

As you mentioned, I have replaced all tabs with spaces in the munge_sumstats.py script. For your new error message, have you activated the ldsc environment before running LOGODetect? This may be due to inconsistent version of python.

nrs225 commented 1 year ago

Thank you for the update. This has still not helped - see error message below:

Extracting number of samples and rownames from 1000G_EUR_QC.fam...
Extracting number of variants and colnames from 1000G_EUR_QC.bim...
Traceback (most recent call last):
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/munge_sumstats.py", line 3, in <module>
    import pandas as pd
  File "/mnt/storage/apps/eb/software/SciPy-bundle/2021.10-foss-2021b/lib/python3.9/site-packages/pandas/__init__.py", line 13
    missing_dependencies.append(f"{dependency}: {e}")
                                                   ^
SyntaxError: invalid syntax
Traceback (most recent call last):
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/munge_sumstats.py", line 3, in <module>
    import pandas as pd
  File "/mnt/storage/apps/eb/software/SciPy-bundle/2021.10-foss-2021b/lib/python3.9/site-packages/pandas/__init__.py", line 13
    missing_dependencies.append(f"{dependency}: {e}")
                                                   ^
SyntaxError: invalid syntax
Traceback (most recent call last):
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/ldsc.py", line 12, in <module>
    import ldscore.ldscore as ld
  File "/mnt/storage/nobackup/nrs225/Sims/LOGODetect/ldsc/ldscore/ldscore.py", line 2, in <module>
    import numpy as np
  File "/mnt/storage/apps/eb/software/SciPy-bundle/2021.10-foss-2021b/lib/python3.9/site-packages/numpy/__init__.py", line 132
    raise ImportError(msg) from e
                              ^
SyntaxError: invalid syntax
Error in file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") : 
  cannot open the connection
In addition: Warning message:
In file(paste0(out_dir, "/tmp_files/ldsc/ldsc_rg.log"), "r") :
  cannot open file '/mnt/storage/nobackup/nrs225/Sims/LOGODetect/LOGODetect_data/results/tmp_files/ldsc/ldsc_rg.log': No such file or directory
Execution halted

I cannot get the copy of LDSC provided to run at all - I have tried to activate the environment as per the instructions and LDSC does not run. This is why I use LDSC as a module that I load on the HPC cluster and it works fine on its own.

Please could you provide an option to feed in the appropriate output from LDSC (that is run separately) into LOGODetect.

Thanks

nrs225 commented 1 year ago

I have tried to get LOGODetect to work, but it appears that there is an issue with how Python is called within R and this is why the LDSC component keeps failing as it is overriding the environment instructions needed for LDSC to work. Please could you resolve this issue?

Many thanks.

ghm17 commented 1 year ago

Can you provide me the input data? Maybe I can try to run LOGODetect and then send you back the results.

nrs225 commented 1 year ago

I have solved the issue and it appears that LOGODetect is not compatible with R version 4.1.2