Open treblegaia opened 11 months ago
Thanks for raising this issue!
Yes, pad_seqs_sdata
will need updated to actually use the seq_var
. For now you can run:
import seqpro as sp
import xarray as xr
padded_seqs = sp.pad_seqs(seqs=sdata["hg38"].values, length=1000)
sdata["hg38_padded"] = xr.DataArray(padded_seqs, dims=["_sequence", "length"])
or you can change the variable name to "seq" using:
sdata.rename_vars({"hg38", "seq")
pp.pad_seqs_sdata(sdata, length=1000,seq_var="hg38")
Run the above line and I got this error:---> [97] padded_seqs = sp.pad_seqs(seqs=sdata["seq"].values, pad=pad, pad_value=pad_value, length=length)
[98] sdata[f"{seq_var}_padded"] = xr.DataArray(padded_seqs, dims=["_sequence", "length"])
--> [185] raise KeyError(key)
KeyError: 'seq'
https://github.com/ML4GLand/EUGENe/blob/13db749d9a639d8baf0a92f536b6dcca02e9c838/eugene/preprocess/_seqdata.py#L97C101-L97C101 Does it mean seq_var of the sdata have to be ”seq“