IARCbioinfo / PVAmpliconFinder

GNU General Public License v3.0
1 stars 1 forks source link

Formatting for Infos File #2

Closed cwarden45 closed 4 years ago

cwarden45 commented 4 years ago

Hi,

I am trying to first test PVAmpliconFinder on the 8 demo samples, and then on some samples with a different L1 Amplicon primer set.

I think I might need to manually configure some programs (since I got an error message with the automatic installation method), but I am not sure what is currently missing.

So, I thought testing the 8 current samples would be the best strategy.

Even though it is an optional parameter, do you have an example of what the -f info_file should look like for PVAmpliconFinder.sh (which I think is actually being used by PVAmpliconFinder_step2.pl)?

I think this is supposed to contain information about the primers, pool design, and tissues?.

If you have this already created for the 8 demo samples from the paper, then that would be great.

Thank you very much.

Sincerely, Charles

SixEl27 commented 4 years ago

Dear Charles,

Thanks for testing PVAmpliconFinder.

Please let me know if you identify the missing part in the automatic installation script. It was only tested for Ubuntu 16.04 LTS and Centos 7.0, so it may miss some packages for other distributions. The packages needed for the installation on a MAC environment are also included in the installation script, but I've not tested the automatic installation myself on a MAC.

You can find the info file corresponding for the 8 testing files as a Supplementary Table 1 of PVAmpliconFinder publication at BMC Bioinformatics.

I also attach the file to this reply, and I will upload it in the GitHub repo.

I'll take time to answer your second issue soon.

Sincerely, Alexis

info_file.txt

cwarden45 commented 4 years ago

Thank you very much for pointing out that file.

However, it looks like the primer sequences themselves are not in the file. Do I need to specify those in a different way?

Also, I think I am getting a formatting issue, as described in the other ticket (#3).

For example, it sounds like the expected header may be ID primer pool, but this is the content of Table S1 as a tab-delimited text file:

ID  Primer set  Tissue
pool1-skin-pathogen_S1_L001 Beta3_1 skin
pool2-skin-pathogen_S2_L001 Beta3_2 skin
pool3-skin-pathogen_S3_L001 FAP skin
pool4-skin-pathogen_S4_L001 FAPM1   skin
pool5-skin-pathogen_S5_L001 CUT skin
pool6-oral-pathogen_S6_L001 FAPM1   oral
pool7-oral-pathogen_S7_L001 FAPM2   oral
pool8-oral-pathogen_S8_L001 CUT oral

If I use the header provided by the program, then does that mean I won't be specifying the issue information?

I apologize that I need to re-open the ticket, but I think some reformatting is required?

I also appreciate that you attached the file, even though I did not notice that at first (and it looks the same as the file that I created from the supplemental table).

SixEl27 commented 4 years ago

Thanks for noticing the restriction in the formatting of the header of this file, that I did not include in the doc. This is now corrected : I updated the README to precise the exact formatting restriction of the header, and I modify this line in the info_file.txt. Indeed, this tabular file must contain as a first line: ID primer tissue

My initial upload of this file yesterday had a format matching the SuppTable1 of the referenced publication, that has been slightly changed as compared to the format needed by PVAmpliconFinder.

Regarding the primer sequences, they are not needed by the program. The information present in the infofile is only used to stratify the final results generated. This allow to show what type of PV is amplified by which primer in each tissue.

I already took in account the issue you had with the installation scripts (#3), and I'll come back to you soon regarding the issue you still have when using PVAmplconFinder.

cwarden45 commented 4 years ago

Thank you very much for your prompt assistance!

I will focus on the other thread. I no longer see that specific error message when I change the header.