PriceLab / STP

Code and commentary from The Self-Taught Programmer, Althoff 2016: python + R
Apache License 2.0
1 stars 0 forks source link

logical(true/false) #25

Open aishahmohamed98 opened 5 years ago

aishahmohamed98 commented 5 years ago

@paul-shannon Paul, here is a small snippet of code that you asked of me which generates a vector of 15 random true or false logicals using the runif function as well as the as.logical function!

x <- round(runif(15,0,1)) x.TorF <- as.logical(x)

Please let me know of any comments or improvements to the code i can make.

aishahmohamed98 commented 5 years ago

@paul-shannon Paul,

i've pushed an edited utils.R and test_utils.R onto the repo with the addition of a new function in utils.R, "generateRandomLogicals", and a test in test_utils.R that tests to see if the logicals thatre randomly generated are about 50/50. Please let me know what you think and of any comments for improvement.

paul-shannon commented 5 years ago

@aishahmohamed98 length(which(randoms)) will tell you how many of your returned vector are TRUE.

Keep in mind that runif is a pseudo-random (look that up!) function, so you will not know exactly what percentage of TRUEs you will get. By random bad luck, you could get very few!

One solution is to call set.seed(37) (where 37 could be any prime number) immediately before you call generateRandomLogicals(n) . Experiment with different values of the seed until you get a pretty balanced count of TRUE and FALSE. Then make that your basic test.

A better test is to call generateRandomLogicals many times - say 100. Determine the percentage of TRUEs for each call. Over 100 calls, it should average out to about 50%.

So I propose: improve your very short & simple test_generateRandomLogicals using set.seed(n) after your experiments allow you to cherry pick a value of n that gives a good result - that is, which avoids the unlikely but completely possible eventuality that you get, say, 3 TRUES and 47 FALSEs.

Then write a new and more complicated test_generateRandomLogicals_monteCarlo (look up monte carlo & probability) which makes many calls, and only cares about the average TRUE/FALSE ratio across all of those calls.

This will be an excellent learning experience!

P.S. When I source('test_utils.R"); runTests() in a new R session, one of the tests failed. The lesson to be learned here: always run your tests, before a commit, from a brand new, completely clean R session. Then any lingering variables or functions still lying around from earlier work will not deceive you!

aishahmohamed98 commented 5 years ago

Paul, sorry on the late reply- i got lost in the research and studying of the two new keywords, pseudo-random and set.seed, that i forgot to reply. This is definitely a challenge but I think i will be able to conquer it successfully!

-Aishah

On Wed, Sep 12, 2018 at 2:24 PM Paul Shannon notifications@github.com wrote:

@aishahmohamed98 https://github.com/aishahmohamed98 length(which(randoms)) will tell you how many of your returned vector are TRUE.

Keep in mind that runif is a pseudo-random (look that up!) function, so you will not know exactly what percentage of TRUEs you will get. By random bad luck, you could get very few!

One solution is to call set.seed(37) (where 37 could be any prime number) immediately before you call generateRandomLogicals(n) . Experiment with different values of the seed until you get a pretty balanced count of TRUE and FALSE. Then make that your basic test.

A better test is to call generateRandomLogicals many times - say 100. Determine the percentage of TRUEs for each call. Over 100 calls, it should average out to about 50%.

So I propose: improve your very short & simple test_generateRandomLogicals using set.seed(n) after your experiments allow you to cherry pick a value of n that gives a good result - that is, which avoids the unlikely but completely possible eventuality that you get, say, 3 TRUES and 47 FALSEs.

Then write a new and more complicated test_generateRandomLogicals_monteCarlo (look up monte carlo & probability) which makes many calls, and only cares about the average TRUE/FALSE ratio across all of those calls.

This will be an excellent learning experience!

  • Paul

P.S. When I source('test_utils.R"); runTests() in a new R session, one of the tests failed. The lesson to be learned here: always run your tests, before a commit, from a brand new, completely clean R session. Then any lingering variables or functions still lying around from earlier work will not deceive you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PriceLab/STP/issues/25#issuecomment-420803162, or mute the thread https://github.com/notifications/unsubscribe-auth/Amf7sDVZpyJ6Sm8G_W3UgiOcUr1Xvv-Cks5uaXujgaJpZM4WiLJx .

aishahmohamed98 commented 5 years ago

@paul-shannon paul, i've checked in code with an update to the test_randomLogicals in test_utils.R that generates about a 50/50 true or false logicals when randomly generated using set.seed(n). Let me know if i am going in the direction you expected.

While delving into montecarlo and probability in R, i found myself getting lost in a sea of information on the internet and am struggling to find a rock to hold onto. Are there certain functions you would suggest when looking into when creating the test_generateRandomLogicals_MonteCarlo? (as far as i've read, monteCarlo is a library package in R) Or is montecarlo() and probability() the functions you suggest?

paul-shannon commented 5 years ago

better yet (I understand the sea [swamp?] of information you found), just focus on this:

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.

This applies here since

1) with one call to generateRandomUtils(10), you may get 0 TRUEs, or 3, or 5 or 10 2) but ON AVERAGE, after MANY calls, you would get roughly an average of 50% TRUEs, 50% FALSES 3) so, like a gambler at Monte Carlo https://www.pokerstars.com/en/blog/2017/pokerstars-championship-monte-carlo--most-stylish-gambling-destination-165988.shtml?no_redirect=1, you try again and again, losing sometimes, winning sometimes but on average - if you are very skilled at figuring the odds - you come out where you expect, (slightly ahead).

No need to call R functions beyond your use of sample().

Monte Carlo, for our purposes, is just a metaphor, an image, a general approach to getting predictable results. That is, run your random procedure enough times so that the expected (average) result is the in-fact actual average result you get.

Make sense?

There is still no real test in test_randomLogicals. You need to used

checkEquals(a, b) checkTrue()

set.seed(29) random.logicals <- generateRandomLogicals(50) true.elements <- which(random.logicals) # why does this work? what does which() do? checkTrue(length(true.elements) > 20) checkTrue(length(true.elements) < 30)

On Sep 13, 2018, at 11:45 AM, aishahmohamed98 notifications@github.com wrote:

@paul-shannon paul, i've checked in code with an update to the test_randomLogicals in test_utils.R that generates about a 50/50 true or false logicals when randomly generated using set.seed(n). Let me know if i am going in the direction you expected.

While delving into montecarlo and probability in R, i found myself getting lost in a sea of information on the internet and am struggling to find a rock to hold onto. Are there certain functions you would suggest when looking into when creating the test_generateRandomLogicals_MonteCarlo? (as far as i've read, monteCarlo is a library package in R) Or is montecarlo() and probability() the functions you suggest?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aishahmohamed98 commented 5 years ago

Got it! I was so confused but I see clearer now with your input. I've pushed an updated test_utils.R with appropriate testing for 50/50 generated results. I had no idea i pushed a .#utils.R# file, thank you for pointint that out! Will make sure to make note of that and double check everything before presenting.

paul-shannon commented 5 years ago

Looks good, Aishah.

Now add a new test function which

On Sep 13, 2018, at 1:50 PM, aishahmohamed98 notifications@github.com wrote:

Got it! I was so confused but I see clearer now with your input. I've pushed an updated test_utils.R with appropriate testing for 50/50 generated results. I had no idea i pushed a .#utils.R# file, thank you for pointint that out! Will make sure to make note of that and double check everything before presenting.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aishahmohamed98 commented 5 years ago

@paul-shannon ive push an updated test_utils.R with the new function, test_generateRandomLogicals_montecarlo with said instructions. I'm faced with a problem i cant seem to fix though and wanted your input. I've made the for loop and have successfully appended each outcome to the list but i am wondering why the generator doesn't generate a new random result with each time it goes through the for loop? any thoughts?

paul-shannon commented 5 years ago

Think about the effects of

set.seed(29)

In addition, please emulate ALL of my layout practices.

On Sep 17, 2018, at 3:39 PM, aishahmohamed98 notifications@github.com wrote:

@paul-shannon ive push an updated test_utils.R with the new function, test_generateRandomLogicals_montecarlo with said instructions. I'm faced with a problem i cant seem to fix though and wanted your input. I've made the for loop and have successfully appended each outcome to the list but i am wondering why the generator doesn't generate a new random result with each time it goes through the for loop? any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aishahmohamed98 commented 5 years ago

oh, i see now! i've pushed a test_utils.R i feel confident in! Working on assigning random logical to node. Let me know your thoughts.

-Aishah

paul-shannon commented 5 years ago

better, but you have no actual test in the test. Look at all of the other test functions. THAT’s the form to emulate.

set.seed is useful. you deleted it. how does it help? where in your test function should it be?

where do you collect your results from repeated calls to generateRandomLogicals()?

On Sep 17, 2018, at 4:10 PM, aishahmohamed98 notifications@github.com wrote:

oh, i see now! i've pushed a test_utils.R i feel confident in! Working on assigning random logical to node. Let me know your thoughts.

-Aishah

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

paul-shannon commented 5 years ago

here's a related example. study it carefully. ask me about anything which does not make COMPLETE sense. the vector function is probably new to you.

set.seed(17)
max.repetitions <-100; 
results <- vector(mode=numeric, length=max.repetitions)
for(i in 1:max.repetitions)
   results[i] <- mean(runif(n=100, min=1, max=1000))
print(mean(results))
aishahmohamed98 commented 5 years ago

Paul, i can see what you're saying now. and vector was new to me so i used list() thinking it would work the same. Will do!

paul-shannon commented 5 years ago

list() or c() is sensible also, used like this:

results <- c() for(i in 1:max){ newResult <- someCalculationHappensHere() results <- c(results, newResult) }

In other words, you create an empty vector, than append to it each time through the loop.

The hidden problem with this: every time through the loop you are actually create a NEW results vector. And each time it is a little bigger. In a small example like this, you won’t notice this extravagance.

But in a large, long program, dealing with bigger data, you’ll be creating and copying memory in a very wasteful way.

So better to create a vector to start with, and ask R to allocate its full length, right at the start. Then each time through the loop you just assign into the next available slot in the already fully existent data structure.

On Sep 17, 2018, at 5:10 PM, aishahmohamed98 notifications@github.com wrote:

Paul, i can see what you're saying now. and vector was new to me so i used list() thinking it would work the same. Will do!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aishahmohamed98 commented 5 years ago

I see.. with using the vector example you provided, is it best to follow the "mode= "numeric", length= max.repititions" or can i substitute it with anything else? I was thinking of just inputting a numerical value in the length parameter.

paul-shannon commented 5 years ago

remember DRY: do not repeat yourself. max.repetitions is used twice, therefore assign it to a variable in exactly one place resuse that variable where needed

then if you you want to repeat, e.g., 1024 times, you only have to change the number in one spot.

max.repetitions <- 10 results <- vector(type=“numeric”, length=max.repetitions) for(i in 1:max.repetitions){ do stuff }

On Sep 17, 2018, at 5:30 PM, aishahmohamed98 notifications@github.com wrote:

I see.. with using the vector example you provided, is it best to follow the "mode= "numeric", length= max.repititions" or can i substitute it with anything else? I was thinking of just inputting a numerical value in the length parameter.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aishahmohamed98 commented 5 years ago

@paul-shannon paul, i've pushed test_utils.R with an updated montecarlo test that generates 200 random T/F 10 times and checks to see the average of the results. Let me know of any thoughts or opinions.

aishahmohamed98 commented 5 years ago

With testing to see if the average is around 50/50 which would be around 100, i've made it so the test checks if the output is somewhere in between the range of 90 and 110 to see if the average result would fall somewhere in between the two numbers-- which passes! Thoughts?