Closed elslabbert closed 6 years ago
hey @elslabbert have you tried specifying the arguments log_scale = TRUE
or inds
? These arguments help to ensure that the individual based rarefaction curves are not computed at every possible integer. When you have a very large N as is typical in microbial communities these arguments become more important. I would first try log_scale = TRUE
and if that still doesn't work then specify inds
you'll have to decide either 1) how many points you want on the individual rarefaction curve or 2) the actual numbers of individuals you want to compute results for as inds
can be used to specify either of these. Let us know how it goes - so far we don't have many folks using the package yet on microbes so I'm eager to see if its possible.
I noticed that as implemented currently the argument log_scale
does not spread effort across a log scale unless the argument inds
is set to an integer which indicates the number of points to compute the curves at. I'm going to fix this so that inds
can be left as NULL
but if log_scale
is set to TRUE
that the efforts are spread across a log base 2 range.
Hi dmcglinn, I initially had the log_scale = TRUE, which didn't work (perhaps because of that you mention in your last post). Ran it again, for if you had managed to fix the code as described above, but it still gave me the same error. I then ran it with inds set to inds = 100. That seems to be running, although it is still taking some time (been running for >60min). Guess I should try running it with a higher integer value?
Hey @dmcglinn, so I have moved on to running my code on an even larger data set (>62 000 OTU's) and have set the commands as discussed above. With log_scale = TRUE, and inds = first as 1000 then 10 000. But get the error again about the get error of the function not being able to allocate such a large vector (cant allocate vector of 912.7 MG). Here is the complete code:
`for(i in 1:length(dat.list)){
mob_in.list[[i]] <- make_mob_in(comm = dat.list[[i]], plot_attr = my_coords[[i]], coord_names = c('lon', 'lat'), latlong = T)
mob_result.list[[i]] <- get_mob_stats(mob_in.list[[i]], group_var = "group", n_perm=99)# reduced permutations from 199 what was set on initially (for the plants & Fungi) to 99 for due to very large bacteria dataset. Re-ran the plants and fungi to standardize the analyses across taxa
delta_result.list[[i]] <- get_delta_stats(my_mob_in, 'group', ref_group='Pasture', type='discrete', log_scale=TRUE, n_perm=99) # n_perm 99 for all three taxa }`
@dmcglinn ...so I tried it again and let it run over the weekend, with the same settings, and it ran to completing. But also go 50 warning messages for the data set saying: In anova.lm(mod) : ANOVA F-tests on an essentially perfect fit are unreliable.
How would you recommend I address this to make the results more reliable?
Hey @elslabbert thanks for these updates on this issue. I'm not confused why get_delta_stats
would generate the anova warnings because that function never calls anova
. The function get_mob_stats
does call that function when computing the F-value which is uses as a test statistic for permutation tests. Is it possible that this is a warning from running get_mob_stats
instead of get_delta_stats
? Also that warning indicates that you have zero or maybe one degree of freedom (its like a t-test when n = 2).
Hi @dmcglinn thanks for the prompt follow up. The loop I have set-up runs through both of these functions in sequence, so the warning messages are then from get_mob_stats, not get_delta_stats. I am running one of the datasets separately through these two steps to double check.
Regarding the last sentence in your comment above: despite an attempt at a balanced sampling design, the data I' using is not 100% equal in its no. of replicates/sampling effort per treatment across sites. Some sites have less replicates per treatment group than others (e.g. site 1 equal sampling effort (30:30), but at site 2 there are only 10 replicates of treatment A and 30 or treatment B). Could this be causing the issue?
Is this issue you resolved now? Thanks!
I can confirm that specifying log_scale = TRUE
in the function get_delta_stats
now does actually reduce the number of sampling points as expected. See #219 for further update to this function to catch a corner bug.
Hi there, I am trying to run several very large microbial datasets using the MOB package (data has 370 obsv. and > 14500 variables). So far everything has run smoothly, but then I get (?a computational error) error at the get_delta_stats function.
Script: site1_mob_in = make_mob_in(site1_dat, site1_coords, coord_names = c('lon', 'lat'), latlong = T)
site1_delta_stats = get_delta_stats(site1_mob_in, 'group', ref_group='land_useA',
Any suggestions as to how to overcome this error?
Thanks, elslabbert