dieterich-lab / scimodom

Sci- ModoM: A quantitative database of transcriptome-wide high-throughput RNA modification sites
https://dieterich-lab.github.io/scimodom/
GNU Affero General Public License v3.0
0 stars 0 forks source link

Sequence context not working on production #131

Closed eboileau closed 3 months ago

eboileau commented 3 months ago

Aims/objectives.

Reading the sequencing context works fine locally (development), but fails with

39717e708d0f 2024-08-09 13:54:35 [ERROR] scimodom | Exception on /api/v0/modification/genomic-context/5 [GET]
58d25db9ee8d 2024-08-09 13:54:08 0 [Note] mariadbd: ready for connections.
39717e708d0f Traceback (most recent call last):
58d25db9ee8d Version: '11.2.3-MariaDB-1:11.2.3+maria~ubu2204'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask/app.py", line 1473, in wsgi_app
58d25db9ee8d 2024-08-09 13:54:08 0 [Note] InnoDB: Buffer pool(s) load completed at 240809 13:54:08
39717e708d0f     response = self.full_dispatch_request()
39717e708d0f                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask/app.py", line 882, in full_dispatch_request
39717e708d0f     rv = self.handle_user_exception(e)
39717e708d0f          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
39717e708d0f     return cors_after_request(app.make_response(f(*args, **kwargs)))
39717e708d0f                                                 ^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask/app.py", line 880, in full_dispatch_request
39717e708d0f     rv = self.dispatch_request()
39717e708d0f          ^^^^^^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask/app.py", line 865, in dispatch_request
39717e708d0f     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
39717e708d0f            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/flask_cors/decorator.py", line 130, in wrapped_function
39717e708d0f     resp = make_response(f(*args, **kwargs))
39717e708d0f                          ^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/scimodom/api/modification.py", line 115, in get_genomic_sequence_context
39717e708d0f     sequence = file_service.read_sequence_context(seq_file)
39717e708d0f                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39717e708d0f   File "/app/venv/lib/python3.11/site-packages/scimodom/services/file.py", line 492, in read_sequence_context
39717e708d0f     sequence = fh.readlines()[1].strip()
39717e708d0f                ^^^^^^^^^^^^^^
39717e708d0f   File "<frozen codecs>", line 322, in decode
39717e708d0f UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 23: invalid start byte
exit code: 0

on production. This happens here: https://github.com/dieterich-lab/scimodom/blob/d297d0eb43b2b08c10452b4cb946b6c6304c4adc/server/src/scimodom/services/file.py#L489

On docker, we have /tmp/bedtools/pybedtools.5wq011d4.tmp: ISO-8859 text and the offending line is composed on non-readable characters (bytes)

Locally, we have pybedtools.br03fzlu.tmp: ASCII text and the line is text.

So reading and processing the sequence e.g. /data/assembly/Homo_sapiens/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.2.fa.gz seems to be done differently, as if pybedtools behaves differently in reading/writing files?!?

A clear and concise description of todo items.

We need a working sequence context.

eboileau commented 3 months ago

Check https://github.com/arq5x/bedtools2/issues/56 for reference.