ConesaLab / SQANTI3

Tool for the Quality Control of Long-Read Defined Transcriptomes
GNU General Public License v3.0
197 stars 48 forks source link

sqanti3_rescue.py: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType #253

Closed ChiaraCaprioli closed 7 months ago

ChiaraCaprioli commented 7 months ago

Hello,

Thank you for this great tool. I am trying to run sqanti3_rescue.py

python $PATH_TOOLS/SQANTI3-5.2/sqanti3_rescue.py ml \
$PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_MLresult_classification.txt \
--isoforms $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_corrected.fasta \
--gtf $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered.filtered.gtf \
-g $PBS_O_WORKDIR/benchmarking/gtf/gencode.v45.annotation.gtf \
-k $PBS_O_WORKDIR/ref/gencode.v45.annotation_classification.txt \ 
--mode full \ 
-e all \
-o sqanti3_ml_rescue_output \
-d $PBS_O_WORKDIR/${sample} \
-r $PBS_O_WORKDIR/${sample}/randomforest.RData \
-j 0.7 

and I am encountering the following error:

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Traceback (most recent call last):
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 517, in main
    if not os.path.isfile(args.refGenome):
  File "/hpcnfs/home/ieo4874/.conda/envs/SQANTI3.env/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Do you have any suggestion on how to solve this? Thank you,

C

alexpan00 commented 7 months ago

Hi,

Provide the full path to reference genome FASTA with the -f argument.

Alejandro.

sonalhenson commented 7 months ago

Hi Alejandro, I get the same error with the rules mode, despite giving the full path. My command is as follows:

sqanti3_rescue.py rules \ --isoforms ${OUTDIR}/corrected.fasta \ --gtf ${OUTDIR}/filtered/filtered.gtf \ --refGTF $REF_GTF \ --refGenome $REF_FA \ --refClassif ${OUTDIR}/classification.txt \ --mode full \ -o ds \ -d ${OUTDIR}/rescued \ ${OUTDIR}/filtered/RulesFilter_result_classification.txt

I've also run the command directly on the commandling, using absolute paths but I get the same error. Any insights into what I might be missing?

Thanks

alexpan00 commented 7 months ago

Hi @sonalhenson,

If your error looks like this:

File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 660, in <module> main() File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 549, in main if not os.path.isfile(args.json): File "/home/apadepe/.conda/envs/sq3/lib/python3.10/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

It is because you are missing the -j argument. This is the path to the rules filter in json format. If you used the default rules, you can find this file in utilities/filter/filter_default.json

Hope this fix you problem, Alejandro.

sonalhenson commented 7 months ago

Hi @alexpan00, That was exactly the error and your solution resolved it.

Much appreciate your very rapid assistance.

All best Sonal

francicco commented 4 months ago

Hi @alexpan00,

I'm having the same problem:

sqanti3_rescue.py ml MLfilter_output/${SP}_MLresult_classification.txt \
   -j 0.7 --isoforms $SP.SQANTI3qc_corrected.fasta \
   --gtf MLfilter_output/$SP.filtered.gtf \
   -g $GTF \
   --mode full \
   -f $ASSEMBLY \
   -o MLrescue_output \
   -r MLfilter_output/randomforest.RData
Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 521, in main
    if not os.path.isfile(args.refClassif):
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I don't think I have to use a filter_default.json with the ml option. Cheers F

alexpan00 commented 4 months ago

hi @francicco,

you are missing the --refClassif parameter in your call to the rescue script.

Alejandro

francicco commented 4 months ago

Hi @alexpan00,

thank you! How do I generate it? sqanti3_qc.py takes takes the isoforms (FASTA/FASTQ) or GTF format and the reference annotation. How do I run sqanti3_qc.py to run the refClassif file?

Cheers F

francicco commented 4 months ago

I tried one way... not sure if it was the best way, then I gave the classification file to sqanti3_rescue.py, and I've got this...

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Output directory not defined. All the outputs will be stored at /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output directory

Automatic rescue run via the following command:

/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full

Loading required package: magrittr

---------------------------------------------------------------

        INITIATING SQANTI3 RESCUE...

---------------------------------------------------------------

    --mode full:

        Full rescue mode selected!

        Automatic rescue activated for artifact FSM transcripts.

        Additional rescue steps will be performed for ISM, NIC and NNC artifacts.

---------------------------------------------------------------

    READING FILTER CLASSIFICATION FILE...

Rows: 244753 Columns: 53
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (16): isoform, chrom, strand, structural_category, associated_gene, asso...
dbl (21): length, exons, ref_length, ref_exons, diff_to_TSS, diff_to_TTS, di...
lgl (16): RTS_stage, FL, n_indels, n_indels_junc, bite, iso_exp, gene_exp, r...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

---------------------------------------------------------------

---------------------------------------------------------------

    PERFORMING AUTOMATIC RESCUE...

---------------------------------------------------------------

    ***NOTE: you have set -e all:

        All mono-exonic artifact transcripts will be considered for rescue.

    Rescuing references associated to mono-exon FSM...

    Including mono-exon ISM as rescue candidates...

    Finding FSM-supported reference transcripts lost after filtering...
Error in `dplyr::filter()`:
ℹ In argument: `isoform %in% classif_ism_fsm$isoform`.
Caused by error:
! object 'isoform' not found
Backtrace:
     ▆
  1. ├─rescue %>% ...
  2. ├─dplyr::filter(., isoform %in% classif_ism_fsm$isoform)
  3. ├─dplyr:::filter.data.frame(., isoform %in% classif_ism_fsm$isoform)
  4. │ └─dplyr:::filter_rows(.data, dots, by)
  5. │   └─dplyr:::filter_eval(...)
  6. │     ├─base::withCallingHandlers(...)
  7. │     └─mask$eval_all_filter(dots, env_filter)
  8. │       └─dplyr (local) eval()
  9. ├─isoform %in% classif_ism_fsm$isoform
 10. └─base::.handleSimpleError(...)
 11.   └─dplyr (local) h(simpleError(msg, call))
 12.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
Execution halted
Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 557, in main
    auto_result = run_automatic_rescue(args)
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 59, in run_automatic_rescue
    if subprocess.check_call(auto_cmd, shell = True) != 0:
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full' returned non-zero exit status 1.

Not sure what happened... Thank you for your help Cheers F

alexpan00 commented 4 months ago

Hi @francicco ,

You generate the reference classification running the sqanti3_qc script using your referenceGTF as isoforms and reference. The idea is that you use the same orthogonal data (if you have included any) that you used to run your transcriptome.

You can find more information in this discussion and in the wiki.

Alejandro

francicco commented 4 months ago

Ok, I did right then! But I still have that error during rescue... and I don't know why F

francicco commented 4 months ago

I've found the bug! The classification file from SQANTI3_filter.py has Isoform instead of isoform. I edit it and now it runs.

I'll let you know if I find any other bug.

Cheers F