Lucy-Forrest-Lab / HDXer

HDXer is a package to compute Hydrogen-Deuterium exchange data from biomolecular simulations, compare to experiment, and perform ensemble refinement to fit a structrual ensemble to the experimental data
BSD 3-Clause "New" or "Revised" License
17 stars 6 forks source link

Line 176 is throwing ValueError when the seg file is with two columns in Python 3.8 #4

Closed r78v10a07 closed 1 year ago

r78v10a07 commented 1 year ago

Hi, Line 176 in the analysis.py throw ValueError when reading seg file with two columns. Then, the code thatshould read the two column get excluded as the last exception catch get directly with the ValueError:

Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 176, in read_segfile
    self.segres = np.loadtxt(self.params['segfile'],
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read
    arr = _load_from_filelike(
ValueError: the number of columns changed from 3 to 2 at row 1; use `usecols` to select a subset and avoid this error

https://github.com/Lucy-Forrest-Lab/HDXer/blob/6ad1d73931f6a53922c3c960e6c3f67ebcbd7161/HDXer/analysis.py#L176

lucyforrest commented 1 year ago

Hi Roberto,

Please can you provide the input seg file?

Thanks Lucy

From: Roberto Vera Alvarez @.> Reply-To: Lucy-Forrest-Lab/HDXer @.> Date: Monday, March 20, 2023 at 4:21 PM To: Lucy-Forrest-Lab/HDXer @.> Cc: Subscribed @.> Subject: [EXTERNAL] [Lucy-Forrest-Lab/HDXer] Line 176 is throwing ValueError when the seg file is with two columns in Python 3.8 (Issue #4)

Hi, Line 176 in the analysis.py throw ValueError when reading seg file with two columns. Then, the code thatshould read the two column get excluded as the last exception catch get directly with the ValueError:

Traceback (most recent call last):

File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 176, in read_segfile

self.segres = np.loadtxt(self.params['segfile'],

File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt

arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,

File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read

arr = _load_from_filelike(

ValueError: the number of columns changed from 3 to 2 at row 1; use usecols to select a subset and avoid this error

https://github.com/Lucy-Forrest-Lab/HDXer/blob/6ad1d73931f6a53922c3c960e6c3f67ebcbd7161/HDXer/analysis.py#L176https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLucy-Forrest-Lab%2FHDXer%2Fblob%2F6ad1d73931f6a53922c3c960e6c3f67ebcbd7161%2FHDXer%2Fanalysis.py%23L176&data=05%7C01%7Clucy.forrest%40nih.gov%7Ce5bad313187e4149246008db2980aaab%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638149404818835095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZonUeqN63KCsoWvHOOP3ivnt9CrJ90g4SVdb%2B1w%2FQYI%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLucy-Forrest-Lab%2FHDXer%2Fissues%2F4&data=05%7C01%7Clucy.forrest%40nih.gov%7Ce5bad313187e4149246008db2980aaab%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638149404818835095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UN%2Fu3ibnTLOpqUuAqiAXGYRbA9p5QNzxVuOrb%2Ffrftg%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFPHXIG6EZGHSW5NYWI4NCTW5C33RANCNFSM6AAAAAAWBSINL4&data=05%7C01%7Clucy.forrest%40nih.gov%7Ce5bad313187e4149246008db2980aaab%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638149404818835095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YC7ClRKCfTqN7wI%2BwBf2Ys5iE%2B8CP4veYAQvQ%2F4Jk%2BA%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

fabsugar commented 1 year ago

Hi Roberto, does your segfile have the same format of that provided in the tutorial? See for example that provided for BPTI: $HDXER_PATH/tutorials/BPTI/BPTI_expt_data/BPTI_residue_segs.txt

The format should be (usually 2 columns and optionally 3):

start_residue end_residue chain_index[optional]

best, Fabrizio

r78v10a07 commented 1 year ago

Similar error with the experimental file Archive.zip

r78v10a07 commented 1 year ago

The error reading the exp file is:

Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 208, in read_expfile
    expt = np.loadtxt(self.params['expfile'], dtype=[ ('segres', np.int32, (2,)),
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read
    arr = _load_from_filelike(
ValueError: the number of columns changed from 10 to 9 at row 1; use `usecols` to select a subset and avoid this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/calc_hdx.py", line 238, in <module>
    analysis = analysis.run(cachefn='analysis.pkl')
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/functions.py", line 202, in pickle_wrapped_func
    new_obj = func(*args, **kwargs)
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 580, in run
    self.read_expfile()
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 211, in read_expfile
    raise HDX_Error("There's a problem with the experimental data file. It has too few timepoints. \n" \
HDXer.errors.HDX_Error: There's a problem with the experimental data file. It has too few timepoints.
This can be caused if you've defined chain indices in the segments file but not in the experimental data file.
The error while reading was: the number of columns changed from 10 to 9 at row 1; use `usecols` to select a subset and avoid this error
fabsugar commented 1 year ago

Hi Roberto, please try if removing the chain index in the segfile solves the problem (otherwise you probably have to add the chain index also in the exp file). Specifically, remove the last column of "0" in the file B.seg. Please note that your exp file contains deuterated % (values in the range 0-100), instead it should be fractions (values in the range 0-1), check this file for reference: https://github.com/Lucy-Forrest-Lab/HDXer/blob/master/tutorials/BPTI/BPTI_expt_data/BPTI_expt_dfracs.dat Additionally, please consider whether you want to remove exp data for time=0 (unless you want to have them for some reason), they should correspond to deuteration zero by default. Hope it help, Fabrizio

On Mon, Mar 20, 2023 at 5:10 PM Roberto Vera Alvarez < @.***> wrote:

The error reading the exp file is:

Traceback (most recent call last): File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 208, in read_expfile expt = np.loadtxt(self.params['expfile'], dtype=[ ('segres', np.int32, (2,)), File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter, File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read arr = _load_from_filelike( ValueError: the number of columns changed from 10 to 9 at row 1; use usecols to select a subset and avoid this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/calc_hdx.py", line 238, in analysis = analysis.run(cachefn='analysis.pkl') File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/functions.py", line 202, in pickle_wrapped_func new_obj = func(*args, **kwargs) File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 580, in run self.read_expfile() File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 211, in read_expfile raise HDX_Error("There's a problem with the experimental data file. It has too few timepoints. \n" \ HDXer.errors.HDX_Error: There's a problem with the experimental data file. It has too few timepoints. This can be caused if you've defined chain indices in the segments file but not in the experimental data file. The error while reading was: the number of columns changed from 10 to 9 at row 1; use usecols to select a subset and avoid this error

— Reply to this email directly, view it on GitHub https://github.com/Lucy-Forrest-Lab/HDXer/issues/4#issuecomment-1476939553, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ54ACK3GK6NM65K6O6LW3W5DBUTANCNFSM6AAAAAAWBSINL4 . You are receiving this because you commented.Message ID: @.***>

r78v10a07 commented 1 year ago

Hi, I still get the same exceptions. I put together all the data I'm using so you can test on your side. Inside the ZIP file there is Dockerfile that I use to run HDXer.

Build the docker image like:

cd hdxer
docker build -t hdxer .

Then, to run the command

docker run -i -t -v `pwd`:/data hdxer python /usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/calc_hdx.py -t /data/ranked_0_H.pdb -p /data/ranked_0_H.gro -m BestVendruscolo -exp /data/exp.dat -seg /data/seg.seg -out /data/test/ -dt 0.0 0.16666666666666666 0.8333333333333334 4.166666666666667 20.833333333333332 104.16666666666667 16666.65

The ZIP file also contains the HDX_analysis.log and the test output folder

The error, I'm getting is:

/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/core/_methods.py:269: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/core/_methods.py:258: RuntimeWarning: invalid value encountered in divide
  ret = um.true_divide(
/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/dfpred.py:505: RuntimeWarning: divide by zero encountered in log
  logerr = logf + np.log(err) + np.log(time) # sd(e^Aa) = f * sd(A) * a
Residue predictions complete
/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py:900: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  dtype = np.dtype(dtype)
Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/functions.py", line 192, in pickle_wrapped_func
    cached_obj = pickle.load(open(fn,'rb'))
FileNotFoundError: [Errno 2] No such file or directory: '/data/test/analysis.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 176, in read_segfile
    self.segres = np.loadtxt(self.params['segfile'],
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read
    arr = _load_from_filelike(
ValueError: the number of columns changed from 3 to 2 at row 1; use `usecols` to select a subset and avoid this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/calc_hdx.py", line 238, in <module>
    analysis = analysis.run(cachefn='analysis.pkl')
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/functions.py", line 202, in pickle_wrapped_func
    new_obj = func(*args, **kwargs)
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 575, in run
    self.read_segfile()
  File "/usr/envs/HDXER_ENV/lib/python3.8/site-packages/HDXer/analysis.py", line 189, in read_segfile
    raise HDX_Error("There's a problem reading the values in your segments file: %s \n"
HDXer.errors.HDX_Error: There's a problem reading the values in your segments file: %s
File should contain either 2 or 3 columns of integers, separated by spaces.
Format: start_residue end_residue chain_index[optional]

hdxer.tar.gz

r78v10a07 commented 1 year ago

I created a PR that fix the error https://github.com/Lucy-Forrest-Lab/HDXer/pull/5

The problem is that in Python ~=3.8 the numpy function np.loadtxt thorw directly ValueError instead of IndexError

r78v10a07 commented 1 year ago

Can anyone review and merge if correct the PR #5 I created, please?

fabsugar commented 1 year ago

Hi Roberto, I am checking whether this is the only place in the code leading to this problem (and eventually correct those as well). Before merging your fix in the base branch I also need to do more systematic tests to assess which python versions lead to this error and eventually implement a more general fix that works across different versions. Thank you again.

fabsugar commented 1 year ago

Commit f06191e solves issue #4. The issue appeared when using more recent numpy versions (e.g. 1.24.2), which, as mentioned by Roberto, when reading the segfile (in analysis.py) thorw ValueError instead of IndexError generating an error. Commit f06191e entails a modification of the code that reads the segfile, in which a nested except: and try:is used instead of except IndexError:. This modification works for both recent and previous numpy versions.