Gabaldonlab / perSVade

perSVade: personalized Structural Variation detection
GNU General Public License v3.0
36 stars 5 forks source link

Error in sv_functions.get_integrated_SV_CNV_df_severalSamples #20

Closed nescott closed 4 months ago

nescott commented 7 months ago

Thank you for developing this pipeline, describing the installation options clearly, and for having such a detailed wiki!

When I run sv_functions.get_integrated_SV_CNV_df_severalSamples for for only 10 samples, it finishes successfully. But for 100 samples, it's failing with the following error log:

Traceback (most recent call last):
  File "integrate_sv.py", line 11, in <module>
    threads=16)
  File "/perSVade/scripts/sv_functions.py", line 21085, in get_integrated_SV_CNV_df_severalSamples
    SV_CNV = get_SV_CNV_df_with_common_variantID_acrossSamples(SV_CNV, outdir_common_variantID_acrossSamples, pct_overlap, tol_bp, threads)
  File "/perSVade/scripts/sv_functions.py", line 19243, in get_SV_CNV_df_with_common_variantID_acrossSamples
    df_bed_all = pd.concat(map(get_bed_df_from_variantID, all_variantIDs)).sort_values(by=["chromosome", "start", "end"])
  File "/opt/conda/envs/perSVade_env/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 228, in concat
    copy=copy, sort=sort)
  File "/opt/conda/envs/perSVade_env/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 259, in __init__
    objs = list(objs)
  File "/perSVade/scripts/sv_functions.py", line 19102, in get_bed_df_from_variantID
    posA, posB = varID.split("|")[1].split("-")
ValueError: too many values to unpack (expected 2)

Any suggestions would be appreciated!

MikiSchikora commented 7 months ago

Hi,

I am happy to see that it can be useful! I am sorry about this function, but it was not carefully implemented to be general for all types of samples, it was more like an addition to the pipeline. The current latest release does not yet have a fully functioning way to integrate variants from multiple samples.

I am working on a new release that will include this. Although I cannot publish it yet (I am still implementing some things), the current git commit has already the module 'integrate_several_samples', which can take various outputs of perSVade (for single samples) and generate integrated variant files. I'd recommend using this module on your data. You can obtain the latest commit with git clone https://github.com/Gabaldonlab/perSVade, and use the wrapper <perSVade>/scripts/perSVade integrate_several_samples -h to understand how to use it. This should work on the latest release's perSVade environments (conda, docker or singularity). If you are using docker or singularity, note that you'll need to mount the cloned perSVade repository inside the containers to use it.

Hope this helps,

Miquel Àngel Schikora BSC-IRB

nescott commented 7 months ago

Thanks for the recommendation, and I'm looking forward to the new release!

MikiSchikora commented 7 months ago

Hi,

Great, so let's see if the integrate_several_samples module works for your purposes.

Thanks, Miki

alipirani88 commented 7 months ago

Hi,

Can you include an example command on how to use integrate_several_samples?

Thanks,

MikiSchikora commented 7 months ago

Hi,

It highly depends on the type of variants you want to integrate. It is run with <perSVade> integrate_several_modules -o <outdir> --paths_table <.csv file with the paths> -r <ref genome> -mchr <mito chrom> --repeats_file <repeats> -p <ploidy> .... I recommend you carefully reading the help message <perSVade> integrate_several_modules -h to understand how to set all these arguments.

Thanks, Miquel Àngel

MikiSchikora commented 4 months ago

Good morning,

Could you solve this?

Best,

Miquel Àngel