bmansfeld / QTLseqr

QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis
64 stars 42 forks source link

df <- importFromTable() troubles #51

Closed juanlu16 closed 1 year ago

juanlu16 commented 2 years ago

Hello bmansfeld

I am having problems running QTLseqr. When I use the "importFromTable" function to assign the value to the variable "df" I am getting the following failures.

I have defined the variables as follows (the names "POOLR_sorted.bam" and "POOLS_sorted.bam" are the names of the two pools I use):

library("QTLseqr") HighBulk <- "POOLR_sorted.bam" LowBulk <- "POOLS_sorted.bam" Chroms <- paste0(rep("chr2",793229), 2:793230)

On the other hand, I created a table or csv document, and saved it using the "character set: Unicod (UTF-8)", as "Field delimiter: {Tables}", and as 'String delimiter: " ', and saved with the option "Save cell contents as shown" checked.

Subsequently, I have executed the function "importFromTable()", assigning the result to the variable "df" as it is done in the guide. However, I get the following error:

**df <- importFromTable(file="BSA_tabla.csv, highBulk = "POOLR_sorted.bam", lowBulk = "POOLS_sorte.bam", chromList = Chroms)

Error in importFromTable(file = "tablaBSA.csv", highBulk = HighBulk, : No 'CHROM' coloumn found. Además: Warning message: The following named parsers don't match the column names: CHROM, POS**

After seeing this error, I looked in the help or support offered by R: "?importFromTable()". After looking at this help, I saw that the chromosome list was not needed and that the highBulk and lowBulk values could be assigned directly, so I tried the following:

**df <- importFromTable(file="BSA_tabla.csv, highBulk = "POOLR_sorted.bam", lowBulk = "POOLS_sorte.bam")

Error in importFromTable(file = "tablaBSA.csv", highBulk="POOLR_sorted.bam", : No 'CHROM' coloumn found. Además: Warning message: The following named parsers don't match the column names: CHROM, POS**

However, I kept getting the same error, and it would not allow me to move forward with the analysis. Could you please help me?

Thank you very much in advance for your help.

Best regards,

Juan Luis

bmansfeld commented 2 years ago

Hey Juan, Sorry you met a bug. It would be helpful if you shared just the top 100 rows of your CSV that way I could figure out how to trouble shoot. It seems that something must be funky with the column identification and renaming. You can email me the file to bmansfeld --at-- danforthcenter.org Have you tried parsing your VCF using GATK VariantsToTable and using importFromGATK()? Hope to be able to help, Ben

juanlu16 commented 2 years ago

Hi Ben

It's ok, don't worry, and thank you very much for replying and for the help. Sorry for not emailing you at danforthcenter.org. When I try I get an error. On the other hand, I have not been able to find your contact on that same page either.

I created the table with libreoffice calc. I saved it with the parameters I indicated in the bug I wrote in github. This is how the table looks like when I open it again with libreoffice calc:

image

I don't know if it can be seen clearly, if not, let me know and I will send you a larger image.

However, there is one thing that makes me doubt if the table is well generated. When I open it with excel, the aspect that presents:

image

I do not know if the problem is in the generation of the table, or if the program does not detect the columns.

In the screenshots you can only see 30-40 lines, the screen does not catch more. On the other hand, if you wish, I can take images of more lines, in case you want to review the file more.

On the other hand, answering the question about whether I have tried with the data coming from GATK and with the function "importFromGATK()". In this case I have not tried it since I do not have gatk correctly installed and it does not generate the vcf files correctly. It is because of this problem, that I used bcftools mpileup and bcftools call for I get vcf files.

Again, thank you very much for your help Ben, it will be very useful, I am very grateful.

Best regards,

Juan Luis

El jue, 31 mar 2022 a las 3:38, Ben Mansfeld @.***>) escribió:

Hey Juan, Sorry you met a bug. It would be helpful if you shared just the top 100 rows of your CSV that way I could figure out how to trouble shoot. It seems that something must be funky with the column identification and renaming. You can email me the file to bmansfeld --at-- danforthcenter.org Have you tried parsing your VCF using GATK VariantsToTable and using importFromGATK()? Hope to be able to help, Ben

— Reply to this email directly, view it on GitHub https://github.com/bmansfeld/QTLseqr/issues/51#issuecomment-1083959349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWEOHRBITGWNPQDGBX5A333VCT62BANCNFSM5R57AUYQ . You are receiving this because you authored the thread.Message ID: @.***>

bmansfeld commented 2 years ago

QTLseqr::importFromTable() uses the default sep = "," to separate the columns looks like you have a tab or space separating the columns when you save the file so when excel is looking for , in the csv it does not find it and neither does QTLseqr. Try making sure you are either exporting a csv or a tsv and importing the relevant file using the correct sep argument in QTLseqr::importFromTable()

hope that helps. Ben