Open HelloWorldLTY opened 2 months ago
Also, I wonder if it is possible to read variants like inserting rather than replacement. It seems that the current design cannot handle alternative with different length.
File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/data/dataset.py:599, in VariantDataset._load_alleles(self, variants)
597 def _load_alleles(self, variants: pd.DataFrame) -> None:
598 self.ref = strings_to_indices(variants.ref.tolist())
--> 599 self.alt = strings_to_indices(variants.alt.tolist())
File ~/.conda/envs/evo/lib/python3.11/site-packages/grelu/sequence/format.py:251, in strings_to_indices(strings, add_batch_axis)
247 return arr
249 # Convert multiple sequences; they must all have equal length
250 else:
--> 251 assert check_equal_lengths(
252 strings
253 ), "All input sequences must have the same length."
254 return np.stack(
255 [[BASE_TO_INDEX_HASH[base] for base in string] for string in strings]
256 ).astype(np.int8)
AssertionError: All input sequences must have the same length.
Thanks a lot.
Hi @HelloWorldLTY, thanks for raising these points. We do not currently support VCF reading or indels, but we are working on indel support and hope to add it soon.
Thanks, the current best plan I have is to iteratively assign different calling object vr for each sequence and map multiple inserts. It will be very helpful to have such functions.
Hi, thanks for your great work. Do you now support loading the variant from vcf files and filter the variants based on vaf, dp, dq, etc? Thanks a lot.