dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

rf-combine : file name use instead of transcript id #51

Closed Asperatus22 closed 8 months ago

Asperatus22 commented 8 months ago

Hi Danny,

First, I wish you the best for this new year.

I read the documentation and find the way to merge replicate normalization for a transcript with rf-combine. I used it with the 2.6.3 some time ago without any problem. I tried again with 2.6.8 and experiences some trouble.

rf-combine does not use the transcript id (which can be different) of an xml file, but the file name to identify a common transcript, this mean that replicate file must have the same name.

can you please help me to understand why?

Thanks a lot for your work.

Lionel

dincarnato commented 8 months ago

Hi Lionel,

Yes, the filename is used cause, as per RNA Framework standard, that should be identical to the transcript ID in the XML file. This is because rf-combine can also merge entire folders of XML files, and if the ID and filename would be different, then the program would first have to open and read every single file to match the corresponding transcripts between the two folders.

Concerning your issue, I am not sure I understand what you mean. Can you provide two files to reproduce it?

Best, Danny

Asperatus22 commented 8 months ago

Hi Danny, As i process my replicat independently, the reference fasta file are name repX_Target.fa so for the same target, the file name are not the same for différente réplicate. When i looked to the older script i made i notice i renamed the file. I will solve my issu by mooving to an other way to process my replicate, to have the same name for my file at the end of my pipeline. thanks a lot for your work Lionel

dincarnato commented 8 months ago

If you process one file at a time, I can modify rf-combine to be able to merge two given XML files, irrespectively of the file name.

Asperatus22 commented 8 months ago

It could be an idea for the next release to add this as an option. But i will probably use your framework on multiple target so it would be better for me to manage my pipeline differently.
Just by curiosity, why did you choose to produce a single .xml file for each transcript instead of one xml file for all, and then parsing for transcript id? maybe the file size?

dincarnato commented 8 months ago

Hi Lionel,

I have added the possibility to combine 2 XML files whose file names do not match. Can you please try it and let me know?

To answer your question: Having multiple XML files gives flexibility in terms of exporting the data and performing downstream operations. A single XMl file would require every time full parsing.

Cheers, Danny