TheJacksonLaboratory / SVE

GNU General Public License v3.0
51 stars 12 forks source link

File name too long #26

Open marctormo opened 5 years ago

marctormo commented 5 years ago

Hi there,

We are trying to run fusorSV from the docker (in singularity) and we are getting this error: finished modeling in 75.12 sec Traceback (most recent call last): File "/tools/SVE/scripts/FusorSV/FusorSV.py", line 400, in fusor.export_fusion_model(B,J,D,E,alpha,len(snames),K,model_path) File "/tools/SVE/scripts/FusorSV/fusor_utils.py", line 183, in export_fusion_model with open(path,'wb') as f: IOError: [Errno 36] File name too long: './models/hg38.sorted..0123456789ABCDEF101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495.pickle'

We are trying to understand why it generates this, but don't know how to modify this script (/tools/SVE/scripts/FusorSV/fusor_utils.py):

given a partition P[t][b][c][s] write a josn for each

def write_partitions_by_sample(sname_partition_path,P): for t in P: for b in P[t]: for c in P[t][b]: for s in P[t][b][c]: path = sname_partition_path+'_S%s_T%s_B%s.pickle'%(c,t,b) S = {t:{b:{c:{s:P[t][b][c][s]}}}} with open(path,'wb') as f: pickle.dump(S,f) return True

Could you help us, please?

Many thanks!

lslochov commented 5 years ago

Hi @marctormo , if you're looking to use SVE with Singularity, we recently built an experimental SVE Singularity image using the latest development branch source code. It's not as extensively tested, but it's possible that running the Singularity image directly would have fewer issues. If you like, I can share this Singularity image with you.

marctormo commented 5 years ago

Hi @lslochov , Yes, it would be nice to get this image.

Many thanks!

lslochov commented 5 years ago

Hi @marctormo , here's the download link:

https://thejacksonlaboratory.box.com/s/6mqv0yevz092rjpcp3e98bnl5i6qg7ah

marctormo commented 5 years ago

Thanks, I'll check it out!

marctormo commented 5 years ago

Hi @lslochov ,

I build the image, but now I'm getting this error: no coordinate offset map specified Traceback (most recent call last): File "/tools/SVE/scripts/FusorSV/FusorSV.py", line 56, in raise IOError IOError

I see in another thread that it could be caused by the GRCh38 version, but it's not clear for me how to solve this issue yet. Any suggestion?

Many thanks!

marctormo commented 5 years ago

Hi @lslochov ,

Is there any update about this error?

Thanks, Marc

lslochov commented 5 years ago

Hi @marctormo , the version of FusorSV that's included in the Singularity image uses new command line parameters that require the user to provide a coordinate offset map and an SV mask for the reference genome that they're using. The hg19 versions of these files are provided with the included data bundle, but we haven't provided the hg38 files yet.

The new FusorSV command line is:

python scripts/FusorSV/FusorSV.py [-f model_file.pickle] -L DEFAULT -r <FASTA> --coor <coordinate_offset_map_file.json> --sv_mask <SV_mask_file.json> -i <vcfFiles>/ -p <THREADS> -o <OUT_DIR>

Hence, your error came from not including the "--coor" argument.

marctormo commented 5 years ago

Hi @lslochov ,

I'm sorry, but I don't know what "a coordinate offset map" means. Could you please tell me how to create it or have an explained example? Which file does correspond to this map for hg19 in the /data/ folder?

Many thanks!

lslochov commented 5 years ago

The coordinate offset map helps FusorSV find a particular chromosome in a whole genome FASTA file. An offset is given for each chromosome, so FusorSV knows how many characters it needs to "jump over" in the FASTA file to get to that chromosome, so it doesn't have to search the whole file. This greatly speeds up running time. An example is given in the "data" folder, called: "human_g1k_v37_decoy_coordinates.json"

marctormo commented 5 years ago

Hi @lslochov ,

We created a json file with our coordinates but we are still getting an error: File "/tools/SVE/scripts/FusorSV/FusorSV.py", line 367, in R += ru.get_mask_regions(args.sv_mask,O) #svmask from ref complexity File "/tools/SVE/scripts/FusorSV/read_utils.py", line 166, in get_mask_regions for i in range(len(M[k])): TypeError: object of type 'int' has no len()

Do you know what is happening with the file?

Thanks!

RoyLiga commented 4 years ago

Hi, try using the LongPathTool program, it is very helpful.