Open SpikyClip opened 4 months ago
Hi, glad to hear it's useful.
you could use this python script: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py
to see the format of the .somalier files, specifically the read_somalier
function. You could then write out with a new name and name length with all else mostly the same.
you can reverse with int.to_bytes and arr.tobytes()
to reverse the operations you see there.
Thanks for the response, I'll give it a shot when I have the time and let you know how I go.
Hi,
First off, this has been an amazingly useful tool for my work, really appreciate it!
So I didn't realise at the time that samplenames are hardcoded at the somalier
relate
stage (i.e. renaming the somalier files does not affect the output ofrelate
. If I'm not wrong it actually gets the name from within the VCF/BAM?). This results in issues if a sample was run multiple times across batches and you try torelate
them across batches.Is there any way of renaming the samplename within the .somalier binary files? If I could write a script that recursively looped through my batch folders, appending the
batch_id
anddate_processed
, I had run hundreds of these on a per-batch basis. I understand that the appropriate way is to have setoutput-prefix
in thesomalier extract
stage, but I'd rather not have to recall all these bams/vcfs to rerun somalier if possible.It would be great if
somalier relate
had some sort of--samplename-from-filename
flag that would rely on the filename for the samplename (though admittedly, it feels a little hacky). Or a simple--samplesheet samplenames.csv
that maps two columns,sample,somalier_path
for renaming.