ISUgenomics / SequelTools

new repo
GNU General Public License v3.0
26 stars 6 forks source link

Remark on SMRTcell names in plots #3

Open eadriaensen opened 4 years ago

eadriaensen commented 4 years ago

Thank you for bringing us this nice and user-friendly tool.

While generating plots for my dataset I have following remark concerning the generation of names for the different SMRTcells. The current code as below did not work for my particular situation. Possibly this could be adjusted to something that works a bit more general? I know I can adjust the script to my wishes and feed that to the tool, hence why I call this a 'remark'.

#Determine names of SMRTcells
pairNames = c()
for(fileName in SMRTcellStatsFiles){
    pairName = strsplit(strsplit(fileName,".SMRTcellStats_noScraps.txt")[[1]], "/")[[1]][2]
    pairNames = append(pairNames, pairName)
}

Reproducible example of my situation using absolute path(s):

fileName1 = "/some/path/to/some/folder/m54138_180610_050652.subreads.bam"
strsplit(strsplit(fileName1,".SMRTcellStats_noScraps.txt")[[1]], "/")[[1]][2]
#"some"

Hence, I have very non-informative SMRTcell names in my plots that depend on the folder-name.

Also, if I were to use SequelTools for multiple SMRTcells at one go, possibly might be an issue as I expect my second sample to be assigned the same label?

fileName2 = "/some/path/to/some/other/folder/m54138_180610_050653.subreads.bam"
strsplit(strsplit(fileName2,".SMRTcellStats_noScraps.txt")[[1]], "/")[[1]][2]
#"some"
aseetharam commented 4 years ago

Thanks for using SequelTools and for providing the feedback! Did you use fofn (file of file names) as input for SequelTools? We designed SequelTools to run on multiple samples at once since we very much wanted this feature that SMRTlink did not provide. Check out the examples in the benchmarking section. If you use it as shown, you will get the file names as the labels in your final plots. We did not want the folder name to be the sample names as this is something we can't expect users to set-up prior to running SequelTools. Hope this helps! Thanks,

eadriaensen commented 4 years ago

Thanks for your swift reply.

I indeed use a fofn as input, though I am using your tool only one sample at a time. I agree that it is a very nice feature to allow processing of multiple samples in one go. Currently, not doing so yet.

I will clarify a bit further my situation using only one sample: fofn called subreads.txt with content (just one line): /some/path/to/some/folder/m54138_180610_050652.subreads.bam (analogous for scraps.txt)

The job I run: ./SequelTools.sh -t Q -u subreads.txt -c scraps.txt

The label I get for this single sample in all plots is some.

Hope this is clear.

Just wanted to inform you on this possibly unintended behavior, although I understand that my usage of the tool (a single sample at a time) might also be considered 'unintended'.

aseetharam commented 4 years ago

Hi @eadriaensen, really not sure what's happening. I tested it out today and I'm getting filenames as the labels (for just one file). What environment are you using it in? If Linux/Mac, what environment?

Thanks,