Closed sabiqali closed 11 months ago
Hi,
the tool runs in multiple steps as given in the config (https://github.com/fmfi-compbio/warpstr/blob/bb1b0a62f89d00ff7ec72ac98b24e2b7d68e8d81/example/config.yaml#L12-L18)
The first step single_read_extraction
extracts from the input data paths all .fast5 files and stores them in the output folder while also generating the overview file. I see that in the template config this step is incorrectly set to False
. Sorry. Is this step in your config set to True
? Please, in case of the first time running you should set flags for these steps to True
.
Please, see if that does help solve the problem or errors persist. In that case, please provide your full config file. I will be glad to help.
Hi @xsitarcik,
That seems to have solved the issue. The overview.csv
is now being generated.
But, the program is not exiting gracefully. It errored out with the error statement:
AttributeError: 'Pandas' object has no attribute 'saved'
Further, I did have a question about one of the fields in the config file, which did not have a comment on it to describe it. Could you tell me what this line is supposed to be? I just want to make sure that my config file is completely correct. Is it just the number of repeats expected in the reference? Thank you!
Hi,
saved
is a boolean attribute in the overview file denoting whether the locus was found in that particular read or not.
Is tr_region_extraction
flag in config set to True
? It must be set to True in that case for the tool to first localize repeats in reads and save this information in the overview file.
If the tr_region_extraction
flag was set to True and error persists, then please ensure that:
coord
in config corresponds to BAM mapping files and is correctly set, as then no reads are found in the extraction phase. The correspondence between coord
and BAM must be complete, region names must be also equal. For example, if coord
is set to region chr1
but there are no such regions in BAM (usually because they are called differently), no reads are selected for the repeat extraction phase. In that case, check BAM files to see how regions are named, and rename coord
region accordingly.As for noting
field in the config, your assumption is correct - the field denotes the reference locus, usually in some concise representation. This field does not serve any other purpose than being a supplementary information for potential evaluation and comparison with locus sequences predicted by the tool.
Hi,
I was trying to get set up on warpstr and use it to analyze some loci that we do have. Having installed it and updated the config file as asked in the README, I ran into some errors while running the software.
The error statement has been pasted below:
2022-11-07 11:40:40 Processing 1 of 1 Locus name: c9orf72 Flank length not set for locus - using default value of 110 Sequence was not set for c9orf72. Automatic configuration defined sequence as: CCCC(GGCCCC)GG derived from reference sequence CCCC(GGCCCC)[2]GG Not found the overview file /.mounts/labs/simpsonlab/users/schaudhary/projects/2022.10.STRr10toolkit/warpstr/output_folder/c9orf72/overview.csv - Please check the "output" in config
It then errors out with
FileNotFoundError: [Errno 2] No such file or directory:
Would you be able to tell me why the overview file is not being generated? The input in question is a cell line that contains the locus and has been prepped using Cas9. I have also mentioned the output folder where I would like all the output files to be generated.