C-V2X-Senior-Design / CV2X_MachineLearning

0 stars 0 forks source link

Sim.py Serialization #1

Open jasoninirio opened 2 years ago

jasoninirio commented 2 years ago

@gefa The serialization prints out the frames in a 2D array instead of one (do you want to return a 1D array per frame?). The 2D array is as follows: serialization_data[x] refers to the frame number (i.e. serialization_data[0] refers to frame 0).

Is this the right method on approaching this? I figured since a RB that's full of 1s means that there is no jammer affecting it. Does a single point of 0 refer to the RB being jammed? I want to know I am implementing these methods correctly.

jasoninirio commented 2 years ago
 if jamType==1: # narrow band jammer
            # jam every frame with one pixel,
            # ASSUME once jammed RB stays zero for a long time after
            # TODO we should think how to model this...
            grid[jamSF][jamSC] = 0

            #  jump by 1 pixel
            # TODO wrap in a smarter way, more like what happens in reality
            # TODO change ranom walk to explore whole grid more uniformly
            jamSC = (jamSC + choice([1,0,-1]))%10
            jamSF = (jamSF + choice([1,0,-1]))%10
        _frames.append(grid)

this part of the code confirms my question.

gefa commented 2 years ago

Hi @jasoninirio ,

I was using 1s to represent channel usage (aka resource block allocation). I was using 0s to "jam" (hence grid[jamSF][jamSC] = 0). Jamming a particular resource block turns it from "channel in use" - 1 to "channel not in use" - 0. Feel free to change this if it fits to ML model better. After all it's easy to flips 1s to 0s or vice versa.

And yes, you'll probably need serialization, thanks for adding that and the pseudodata. I assume you just randomly added 1s and 0s in 10x10 grid.

INPUT DATA FORMAT Now we need to decide in what exact shape we feed the data to ML. Intuitively, I think we need to feed multiple 10x10 grids into ML model at once. Because, the way ML will detect malicious grid use is by changes in block use. Single 10x10 grid gives you instantaneous usage "snapshot". We need at least two 10x10 grids to get the "difference" in block allocation. Does this make sense? This begs the question: should we just feed the difference in the ML? But let's not, for now.

JAMMED INPUT Let's say each of 1000 pseudodata testcases will have O=5 10x10 grids (O for observation time). You'll serialize these O=5 grids in the same order for each training trial / testing trial. Let's say malicious / jammed testcases will have bit flips lasting for only J=1 10x10 grid instance (J for jamming time). Does this make sense?

NOT JAMMED INPUT Normal, benign resource grid testcase will consist of O=5 10x10 identical grids, that is, no bit flips present.

Let's say you randomly corrupt 50% of these 1000 testcases with these short J=1 bit flips.

Could you modify pseudodata like this by the end of today? So we can have relaxing extended weekend. :) And please tell me how long does it take to train on say 20% of these 1000 test cases? This is just to make sure that the size of this input data is reasonable for your chosen ML model. Then we'll look into accuracy of detection, adding noise, changing some parameters etc.