EnsemblGSOC / Ensembl-Repeat-Identification

A Deep Learning repository for predicting the location and type of repeat sequence in genome.
4 stars 3 forks source link

Updates #1

Closed williamstark01 closed 2 years ago

williamstark01 commented 2 years ago

I've added using pyenv and Poetry to manage the project's Python virtual environment and dependencies. They help with replicating the environment effortlessly, which will be useful when running tasks on the cluster.

Here's their installation instructions, let me know if you have any trouble with installing them: https://github.com/pyenv/pyenv#installation https://python-poetry.org/docs/master/#installing-with-the-official-installer

After you install these two programs, you just need to run the following commands to set up the environment:

pyenv install 3.9.12

pyenv virtualenv 3.9.12 repeat_identification

# (verify that the repeat_identification environment is activated by running `pyenv version`)

poetry install

The second change is that I realized that all repeats families can quickly be downloaded from Dfam before starting downloading annotations, which speeds up that process. (And is saved locally for subsequent runs.)

I also formatted the code with Black (good code formatter for Python, handy to use).

Let me know in the comments or email about anything.

yangtcai commented 2 years ago

Hi, @williamstark01, it's awesome!!! I will try it as soon as possible :DDD

williamstark01 commented 2 years ago

Haha great! Let me know if you need help with setting up anything.