aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
http://atcgn.com:8080/quarTeT/home.html
81 stars 6 forks source link

added maximum TR length parameter for trf #22

Closed atotickov closed 8 months ago

atotickov commented 8 months ago

Dear aaranyue,

Thank you for creating the wonderful quarTeT program! I've been testing it for the past few days and getting excellent results. However, among the currently available parameters for trf, I couldn't find the -l parameter, which is crucial for mammals whose genome assemblies contain a high amount of repetitive sequences. Without this trf parameter, it gets stuck and after several weeks of work, it might not finish. I took the liberty of adding this parameter myself. If you don't mind, please consider accepting the merge request.

With respect and gratitude, atotickov.

Echoring commented 8 months ago

Dear atotickov,

Thank you so much for your generous suggestion! I noticed the issue that trf can stuck for a long time in several chromosomes, but not found this option can help. I also made a little adjustment based on your code to prevent error due to default None. The default value is set to 3 million.

Thank you again for your help! Echoring

atotickov commented 8 months ago

Dear Echoring,

Thank you for adding the parameter! I've come across a minor issue when attempting to install quarTeT in a conda environment via pip. The quarTeT repository lacks the setup.py file required for such installation methods. Would you mind adding it? This file won't impact the functionality of quarTeT; it would simply provide an additional installation method.

Example setup.py file:

__author__ = 'name'

from pathlib import Path
from os.path import join, dirname
from setuptools import setup, find_packages

dependencies = ['dependencies_name1', 'dependencies_name2']

setup(name='quarTeT',
      version='1.1.6',
      packages=find_packages(),
      author='name',
      author_email='email',
      install_requires=dependencies,
      long_description=open(join(dirname(__file__), 'README.md')).read(),
      scripts=list(map(str, sorted(Path('./').rglob("*.py")))))

I would greatly appreciate it if you could add this file.

Thank you!

Best regards, atotickov

Echoring commented 8 months ago

Dear atotickov,

Thanks for your advise. However, I haven't get the point of using setup.py. In my understanding, setup.py is used to install the required python packages via pip, and copy the executable script to $PATH. But quarTeT use no python packages besides the standard packages. The required third party software of quarTeT is unable to found in pip. I think this is unnecessary to use setup.py here?

Meanwhile, I tried to create a setup.py like below, but I find that quarTeT.egg-info/scripts is not automatically generated after build and install, result in error pkg_resources.ResolutionError: Script 'scripts/quartet.py' not found in metadata at ....../quarTeT.egg-info. I manually create this dir and copy the scripts and it works with a warning DeprecationWarning: pkg_resources is deprecated as an API.

import setuptools
import os
import sys

if sys.version_info.major != 3:
    raise EnvironmentError("quarTeT requires python3, and is not compatible with python2.")

setuptools.setup(
    name="quarTeT",
    version="1.1.6",
    author="Yunzhi Lin",
    author_email="linyunzhi20@gmail.com",
    description="A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification.",
    long_description=open('README.md').read(),
    url="http://www.atcgn.com:8080/quarTeT/home.html",
    packages=setuptools.find_packages(),
    py_modules=['quartet_util'],
    scripts=['quartet.py', 'quartet_assemblymapper.py', 'quartet_centrominer.py', 'quartet_gapfiller.py', 'quartet_teloexplorer.py'],
)

I would greatly appreciate it if you could point out the error here.

Thank you!

Best regards, Echoring

atotickov commented 8 months ago

Dear @Echoring,

Since quarTeT is currently unavailable in conda, utilizing it in various pipelines seems challenging from my perspective. As an alternative, installing quarTeT into the conda environment using a .yaml file is feasible, but it cannot be done without setup.py in the GitHub repository.

I haven't encountered this error before, so I can't offer any specific advice. However, I'm also getting the "DeprecationWarning: pkg_resources is deprecated as an API." warning, but it doesn't seem to affect the program's functionality.

As a temporary measure, I added a setup.py file specifying different packages as dependencies to the forked quarTeT branch. This helped me use the program within the conda environment.

Best regards, atotickov