invalid data.source parameter, bed.dir is not found, or is not directory

robert-player commented 2 years ago

Hello,

I'm trying to import a list of files from a very simple sample.annotation.csv file. I've tried using it (sample.annotation) as part of rnb.run.import() as both a data.frame (specified in the manual) and as a filepath string, but I'm still getting the error in the subject of this issue. I'm not certain it's really the annotation df/file that's the issue, but more that the NULL specification for data.dir is not being accepted as detailed in the manual here:

## Data import
#        ‘"bs.bed.dir"’            ‘list’ or ‘character’                ‘1..3’
#   (1) Directory with BED files each giving a DNA methylation profile of a sample;
#   (2) a sample annotation table as a ‘data.frame’ or the name of the corresponding file; 
#       In case only the sample sheet is provided as the second element of the data.source list
#       (the first element can be set to NULL), the provided sample sheet should contain absolute
#       paths to the bed files.

data.source <- c(NULL, sample.annotation)
rnb.run.import(data.source=data.source, data.type="bs.bed.dir", dir.reports=report.dir)

A data.frame of the file contents:

[1] "data.frame"
                                                                                                                    Sample_ID  condition
1   /Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/control_1.bed condition1
2   /Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/control_2.bed condition1
3 /Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/treatment_1.bed condition2
4 /Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/treatment_2.bed condition2

Please let me know if you see where I am going wrong here, and thank you for your time! Robert

schmic05 commented 2 years ago

Hi Robert,

As you correctly pointed out, you'll have to specify the path to your data directory rather than NULL as the first argument of your data.source parameter. This will look something like:

data.source <- c('/Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/', sample.annotation)

Please also see our vignette for more details.

Best, Michael

robert-player commented 2 years ago

Thanks Michael,

Could you comment on this part of the manual then? I may be misinterpreting it, but it seems that NULL is not an acceptable argument as described here:

In case only the sample sheet is provided as the second element of the data.source list (the first element can be set to NULL), the provided sample sheet should contain absolute paths to the bed files.

Is there a way to force RnBeads to use the absolute file paths specified in the sample.annotation (sample sheet), instead of a combination of the single directory from the first element and filenames from the table in the second element?

Also, wrt the sample.annotation variable, should this be a file path string or a data.frame (again, the latter being explicitly called out in the manual, but also seem to not be acceptable).

Appreciate your time and support on this, Robert

schmic05 commented 2 years ago

Hi Robert,

Thanks for pointing this out, it was indeed insufficiently/incorrectly described in the vignette. The text should sound (as in the description of rnb.execute.import):

"In case only the sample sheet is specified, one column should be giving full absolute paths of the BED-like files with sequencing information. If both elements (1) and (2) are specified, the files should reside in the directory, specified as element (1)."

Thus, you should only specify the sample annotation sheet and leave the first element empty. Can you try this?

Meanwhile, I updated the documentation, but it will take a while to propagate.

Best,

Michael

robert-player commented 2 years ago

Understood, however I'm getting the same error:

> sample.annotation <- '/Users/robertplayer/data/tool_rnbeads/test_data_from_bismark/toy_example_1krows_per/sample.annotation'
> data.source <- c(sample.annotation)     # probably not necessary, but tried setting `data.source=sample.annotation`, same results
> rnb.run.import(data.source=data.source, data.type="bs.bed.dir", dir.reports=report.dir)

2022-11-09 09:07:41     0.9  STATUS STARTED Loading Data
2022-11-09 09:07:43     0.9    INFO     Number of cores: 1
2022-11-09 09:07:45     0.9    INFO     Loading data of type "bs.bed.dir"
2022-11-09 09:07:47     0.9  STATUS     STARTED Performing loading test
2022-11-09 09:07:50     0.9    INFO         The first 10000 rows will be read from each data file
2022-11-09 09:07:52     0.9   ERROR         invalid data.source parameter, bed.dir is not found, or is not directory

schmic05 commented 2 years ago

Thanks, Robert.

After having a closer look, it seems as if this option is currently not supported by RnBeads. I apologize for the confusion caused by the documentation and would kindly ask you to store all the files in a single directory and then specify this directory accordingly as your first argument in data.source.

We'll work on fixing the issue.

robert-player commented 2 years ago

OK sounds good. Surprising no one else has experienced this particular issue before. Thanks again for the help Michael, and please lmk when a fix is merged and I'll test it out!

robert-player commented 2 years ago

Hi Michael,

Another question regarding data setup. I'd like to use the condition column of my sample.annotation.csv file as the grouping that diffmeth will use.

I've set options like so:

rnb.options(filtering.sex.chromosomes.removal = TRUE, identifiers.column="Sample_ID", differential.comparison.columns="condition", import.bed.style="bismarkCov", assembly=genome)

but I get the following error:

2022-11-10 20:45:47     8.2  STATUS     STARTED Saving temp objects for debugging
Error in save.rnb.diffmeth(diffmeth, diffmeth.path) : 
  trying to get slot "disk.dump" from an object of a basic class ("NULL") with no slots
Calls: rnb.run.differential -> save.rnb.diffmeth
Execution halted

Which as far as I can tell from googling (https://rnbeads.org/faq.html#a_editAnnot > Analysis Pipeline > Can I introduce additional sample grouping information for analysis?) is perhaps an issue with grouping.

Does it look like I've defined the grouping correctly in the rnb.options function?

schmic05 commented 2 years ago

Hi Robert,

Yes, you specified everything correctly. The reason for the error message might be two things:

the condition column is misspelled
the defined group sizes are not compatible with the RnBeads options min.group.size and max.groups.count.

For instance, this can arise if a sample group only has a single sample.

Best,

Michael

robert-player commented 2 years ago

Ah, thank you. I was only doing 1 sample per group, I didn't notice the group size params.

Looks like the default min.group.size is 2, but will diffmeth still fail to run with 1 sample per group if this is set to 1?

I.e. is the absolute minimum for diffmeth a group size of 2?

schmic05 commented 2 years ago

It will also work for only one sample per group, but you will not receive any p-values, since the statistical method that RnBeads uses (limma) requires multiple samples per group.

epigen / RnBeads

invalid data.source parameter, bed.dir is not found, or is not directory #33