Allows the user to encode the sequence with a certain vocab type rather than the construct_embedding method automatically determining what vocab type it is. This is useful for downstream tasks as it allows the user to, for the same sequences, use different encodings.
Changes
Allows user to pass in preprocessor args through load_dataset(**pp_kwargs), which allows the user to control how the preprocessor will preprocess the dataset.
Allows the user to specify the vocab type they want to encode their sequence features in.
Add vocab that only uses the 20 (natural) amino acids.
What does this PR do?
Allows the user to encode the sequence with a certain vocab type rather than the
construct_embedding
method automatically determining what vocab type it is. This is useful for downstream tasks as it allows the user to, for the same sequences, use different encodings.Changes
load_dataset(**pp_kwargs)
, which allows the user to control how the preprocessor will preprocess the dataset.