Closed nick-youngblut closed 2 months ago
It would be helpful if the requirements.txt file listed supported versions.
As for Install other essential tools
, I'm curious why those tools are not just listed in the conda create -n scNanoGPS
command:
mamba create -n scNanoGPS python=3 numpy scipy \
minimap2 samtools tabix spoa subread liqa longshot bcftools qualimap gffread
Can you please list the supported versions for each of these dependencies?
Note: there is a bioconda recipe for gffread.
Is ANNOVAR fully necessary, given that it is not (and likely will never be) available via conda? Maybe VEP, GEMINI, or VAAST could be used instead?
To be clear, you will likely have to convert the collection of scripts is this repo to a python package, given that you are currently just importing based on fixed, relative paths (e.g., scanner.py), but if you accept PRs, I can help with this.
At least with argparse, you don't have to redundantly set the default twice. For example, in:
parser.add_option("--tmp_dir", dest = "tmp_dir", nargs = 1, default = "tmp",
help = "Temporary folder name. "
"Default: tmp")
... with argparse, you don't need to set "Default: tmp"
in the help text, since it will automatically be added.
At least with argparse, you can auto-check whether the file exists:
parser.add_argument('input_file', type=argparse.FileType('r'), help='Input file')
...so then no need for:
if not os.path.isfile(options.CB_list):
os.system
is not a robust method of calling subprocess (example in your code.
An example of a more robust method using subprocess
:
import subprocess
def run_subprocess(command, timeout=None):
"""
Run a subprocess command and return the output.
Args:
command (list): The command to run as a list of strings.
timeout (int, optional): The timeout for the command in seconds.
Returns:
str: The standard output of the command.
Raises:
subprocess.CalledProcessError: If the command exits with a non-zero status.
subprocess.TimeoutExpired: If the command times out.
"""
try:
result = subprocess.run(
command,
check=True, # Raise CalledProcessError for non-zero exit codes
stdout=subprocess.PIPE, # Capture standard output
stderr=subprocess.PIPE, # Capture standard error
timeout=timeout, # Timeout for the command
text=True # Return output as a string
)
return result.stdout
except subprocess.CalledProcessError as e:
print(f"Command '{e.cmd}' returned non-zero exit status {e.returncode}.")
print(f"Standard output: {e.stdout}")
print(f"Standard error: {e.stderr}")
raise
except subprocess.TimeoutExpired as e:
print(f"Command '{e.cmd}' timed out after {timeout} seconds.")
print(f"Standard output: {e.stdout}")
print(f"Standard error: {e.stderr}")
raise
run_scNanoGPS.sh
There is no || exit 1
or && \
at the end of each command, so all commands will run even if some fail.
It wouldn't be too hard to create a simple Nextflow pipeline to run the python scripts in a more robust manner. Let me know if you'd like some help.
Hi Nick-youngblut,
Thank you for your precious suggestion.
It makes sense, but I cannot make the decision. I'll discuss with my mentor for the bioconda or Dockr release.
I'll review the requirement document, argparse, and system call in the future release. Thank you again.
Regards, Cheng-Kai
@shiauck what specific updates have been made to scNanoGPS (I listed quite a few suggestions above)?
conda is leveraged for installing (some) dependencies (at least, the recommended install method), but conda is not fully leveraged via a bioconda recipe for installing scNanoGPS.
Are there plans on creating a bioconda recipe for scNanoGPS in order to simplify the install to just:
A template created via ChatGPT: