WGLab / PhenoSV

PhenoSV: Interpretable phenotype-aware model for the prioritization of genes affected by structural variants.
MIT License
16 stars 4 forks source link

Issues with BED Files #12

Open poddarharsh15 opened 1 month ago

poddarharsh15 commented 1 month ago

HI @Karenxzr

I'm experiencing issues with BED files when running the PhenoSV module, as illustrated in the attached errors. The errors are from bed files format probably, and I am unable to resolve them.

Could you please take a look and suggest possible solutions?

Thank you for your assistance!

python3 phenosv/model/phenosv.py --sv_file ~/structural_varinats/merged_vcfs/output.bed --target_folder test1/ --target_file_name Final_out

Traceback (most recent call last): File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in main() File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file, File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv if sv.shape[1]==5: AttributeError: 'NoneType' object has no attribute 'shape'

output.zip

poddarharsh15 commented 1 month ago

Hi @Karenxzr I have tried several times with .csv format also please have a look, but i am still getting the same errors output.csv

python3 phenosv/model/phenosv.py --sv_file ~/structural_varinats/merged_vcfs/output.csv --target_folder test1/ --target_file_name Final_out

Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main
    pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file,
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv
    if sv.shape[1]==5:
AttributeError: 'NoneType' object has no attribute 'shape'
Karenxzr commented 1 month ago

Hi, I tested top 20 lines of your output.csv file and worked fine. please use absolute path for the path of --sv_file. It seems PhenoSV did not read your input data correctly.

python3 phenosv/model/phenosv.py --sv_file /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/data/test2.csv --target_folder /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/data --target_file_name test_out
poddarharsh15 commented 1 month ago

-target_folder /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/dat

Hi @Karenxzr thank you for your fast response I have tried several runs using absolute path still gives the same errors please have a look :(( Do I need to use pip install . python3 phenosv/model/phenosv.py --sv_file /home/tigem/h.poddar/structural_varinats/PhenoSV/data/output.csv --target_folder /home/tigem/h.poddar/structural_varinats/PhenoSV/data/ --target_file_name test1

Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main
    pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file,
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv
    if sv.shape[1]==5:
AttributeError: 'NoneType' object has no attribute 'shape'
poddarharsh15 commented 1 month ago

UPDATE: I found that the issue only occurs when processing the entire output.csv file, which contains almost 13,000 structural variants (SVs). When working with a subset of 30 lines from the same file, everything functions correctly without any errors. It seems the problem arises when handling a larger dataset. Could you please advise on possible solutions to address this?

test_run results:- test_out.csv

Karenxzr commented 1 month ago

Hi, as mentioned in the tutorial, you can actually split up the input csv file and run multiple small csv files simultaneously. An example is as below. You can just increase the number of 4 threads to like 32 or so.

bash phenosv/model/phenosv.sh 'path/to/sv/data.csv' 'folder/path/to/store/results' 4 'HP:0000707,HP:0007598'

In addition, the source code is here: https://github.com/WGLab/PhenoSV/blob/main/phenosv/model/phenosv.sh. If you use SLURM, you can split the input file as in the shell script and submit a job array.

One thing I am thinking is maybe there are some abnormal rows in your data caused this error. If you split the file, you might likely identify that observation.

poddarharsh15 commented 1 month ago

Hi @Karenxzr, Do you have any suggestions for converting VCF files to CSV or BED formats? Currently, I am using vcf2bed to convert VCF files to BED format and then manipulating the data to create a CSV file, as shown in the sample data. Any advice or alternative approaches would be greatly appreciated. Thank you!

poddarharsh15 commented 1 month ago

I have identified the issue with my input.csv file, which contained some unrecognized SVTYPE [i.e, ACGGGGCAGGGAGGGCCCCTCTAGAAGCCACCTGTGCAGAC like this ] entries. After removing those and ensuring the CSV file only includes known SVTYPE, I am still encountering an error. Could you please suggest some ideas or solutions for this issue? PS: However the PhenoSV runs after emitting this error and generates a csv output with results, Please the csv file for reference.

combined.csv.out.csv

Thank you in advance for your help!

command applied using SLURM

eval "$(conda shell.bash hook)"
conda activate phenosv

CONFIG_FILE="/home/tigem/h.poddar/structural_varinats/PhenoSV/input_files.txt"
TARGET_FOLDER="/home/tigem/h.poddar/structural_varinats/PhenoSV/final_test"
phenosvsh="/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.sh"
THREADS=64                                       
mapfile -t INPUT_FILES < "$CONFIG_FILE"
SV_FILE="${INPUT_FILES[$SLURM_ARRAY_TASK_ID]}"

echo "Processing SV file: ${SV_FILE}"
echo "Target folder: ${TARGET_FOLDER}"

    bash "${phenosvsh}" "${SV_FILE}" "${TARGET_FOLDER}" "${THREADS}" 'HP:0000707,HP:0007598'

echo "PhenoSV processing completed for ${SV_FILE}!"
Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 122, in main
    sv_df.columns = ['CHR', 'START', 'END', 'ID', 'SVTYPE']
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/generic.py", line 5588, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/generic.py", line 769, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 214, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 1 elements, new values have 5 elements