Aayush-Ankit / puma-simulator

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.
MIT License
50 stars 43 forks source link

Error when running #41

Open leibo-hust opened 3 years ago

leibo-hust commented 3 years ago

I want to test the mlp_l4_mnist model. When I get to the last step I encounter the following error and I don't know how to fix it. Thanks!

python dpe.py -n mlp

error: image

negishubham commented 3 years ago

Hi,

We have provided some test scripts (https://github.com/Aayush-Ankit/puma-simulator/tree/training/test/utils) in the simulator. Please try to use the mlp_layer.sh script to run MLP model, this script is well tested and it runs the mlp_l4_mnist model. If you still get the above error please let us know. For the above error, there have been few updates earlier, I hope that you are using the recent version of the simulator from Github.

Thanks, Shubham

leibo-hust commented 3 years ago

@negishubham Thanks. Actually, I'm using the latest version that I cloned a few hours ago, should I use the training branch or not?

leibo-hust commented 3 years ago

@negishubham Hi, I just ran the run-mlp-layer.sh file and use the default fully-connected-layer model. There are some small errors, like no populate.py file. I solved these small errors, but the final run is still incorrect. image And I think I've seen a similar error.

Thanks, Bo Lei

leibo-hust commented 3 years ago

I was using the default i_mvm before. when I changed it to the following definition image I got the same error as the first one. image

negishubham commented 3 years ago

Hi,

Yes, please use the training branch. Please follow all the instructions from this file (https://github.com/Aayush-Ankit/puma-simulator/blob/training/how_to_run.md) before running the test scripts, this will help with populate.py errors. It has some instructions related to copying few files from the simulator to the compiler.

I think you don't need to change the i_mvm function in the default code for inference. But if you are doing something for training please follow the comments in the code. I would suggest to first check the setup with inference w/o changing anything in src files and run with the test scripts for both mlp and conv layers.

Thanks Shubham

leibo-hust commented 3 years ago

I just followed the steps in the how_to_run.md file exactly from the beginning. When running generate-py.sh, there are some similar errors, they seem to be related to instrn_proto.py. I didn't modify it and used the original file. I don't know if the AssertionError is related to the instrn_proto.py. But I can get the mlp folder. image

Finally, when I execute python dpe.py -n mlp, I get the following error. image I checked the subdirectories under the mlp folder and found that only *.npy exists in tile0 and tile1.

Thanks Bo Lei.

leibo-hust commented 3 years ago

@negishubham Thanks to @FrankWu1998‘s help, I solved the problem (not sure how though). I found that I can't use the -t parameter, also num_tile_compute in config.py doesn't seem to affect the results. Also if you want to test the mlp model, then you should use the default instrn_proto.py.

negishubham commented 3 years ago

Hi,

Thanks, @FrankWu1998 for helping. The assertion error might be due to not setting the num of matrices correctly in the compiler (which is mentioned in the how_to_run.md). But good that you solved it.

Regarding the num_tile_compute parameter: In the current version, there is a function inside dpe.py that calculates the # of compute tiles itself so you don't need to set it manually now.

amankr1279 commented 3 years ago

@leibo-hust @negishubham I am facing the same problem as leibo faced regarding AssertionError(link) as he mentions on March 22. Pls help. Pic has been attached for reference

Thanks Aman Kumar Problem_git

deepika7497 commented 3 years ago

Hi @amankr1279,

This error is probably there because the number of constant MVMUs in the puma-compiler's common.h file is not equal to the num_matrices in puma-simulator's config file. Please make sure that they are same and try running again.

Hope this helps. Regards, Deepika

amankr1279 commented 3 years ago

Hi @deepika7497 ,

Thanks for responding. I looked at common.h and there _N_CONSTANT_MVMUS_PERCORE = 6 and in config.py _nummatrix = 2 I changed to constan_mvmu =2. This helped in no AssertionError in running./generate-py.sh. However, while running python dpe.py -n lstm it should have stopped at 10,000 cycles but it kept on running till ~30k cycles(maybe due to 68 tiles) though I got full simulation. Thanks for helping.

Regards Aman

msabri1372 commented 3 years ago

I have same problem. in common.h, N_CONSTANT_MVMUS_PER_CORE = 6 and I have change num_matrix to 6 also. I use mlp, the number of tile is 5 (tile0 to tile4) I have also change the num_tile_compute = 7 but I have a problem yet.please help me. image

deepika7497 commented 3 years ago

Hi @msabri1372,

This looks like you probably missed a step, either you did not copy the correct folder from compiler to simulator or forgot to use generate_py,sh ... Please follow the steps again and check once. If this happens again then please let us know.

Regards, Deepika

U201814647 commented 3 years ago

Hi @deepika7497

I believe I have followed the steps in the how_to_run.md file exactly from the beginning. I used the default instrn_proto.py and changed num_matrix to 6. Because there are 5 tiles in mlp model and the comment thells us num_tile_compute is the number of tiles mapped by dnn (leaving input and output tiles), so I changed num_tile_compute to 3.Is there anything wrong with my operations? When I use python dpe.py -n mlp, the problem comes. Please help me, thank you.

9F1`%36%{`6O}A6~2NV73AE

Regards U201814647

msabri1372 commented 3 years ago

Hi @deepika7497

I have followed the instructions again as the mentioned in the how_to_run.md but my problem is remained.

Best regards,

amankr1279 commented 3 years ago

Hello @negishubham @deepika7497 and others.

While running python dpe.py -n nmt command, I am facing following problem. Pls help

_Traceback (most recent call last): File "dpe.py", line 231, in DPE().run(net) File "dpe.py", line 160, in run node_dut.node_run(cycle) File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/node.py", line 82, in node_run self.tile_list[i].tile_run (cycle, self.tile_fid_list[i]) File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile.py", line 275, in tile_run [tag_hit, data] = self.receive_buffer.read (vtile_id) File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile_modules.py", line 75, in read if (not self.isempty(vtile_id)): File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile_modules.py", line 58, in isempty if (self.buffer[vtileid]['valid']): IndexError: list index out of range

negishubham commented 3 years ago

Hi @amankr1279

How did you select the value for variable "nmt"? You don't need to give the number of tiles manually, it is internally calculated in the simulator files. Please follow this reply: https://github.com/Aayush-Ankit/puma-simulator/issues/41#issuecomment-802583365 There are some test scripts in the same folder for CNN as well.

Thanks, Shubham

amankr1279 commented 2 years ago

Hi @negishubham , @deepika7497 and others. We did a tile-wise analysis to identify the workload distribution on different tiles and found that some were heavily loaded while others were relatively free. So, we are devising an algorithm which identifies the instructions suitable for shifting. Currently, we are ensuring that the core_num of that instruction in both the original and new tile is same.

However, after shifting the instructions, when I simulate the new ".puma" files i PUMASim, I face following problem(pic attached). This happens with any instruction, that I shift. FYI, I have taken care of tile send/receive too so that data flow is not changed. Pls help. problem

xlonghu commented 2 years ago

Hi, @amankr1279 and others. while running "python dpe.py -n nmt" ----> "IndexError: list index out of range" how did you solve this problem? and another problem: while running ./vgg16.test or other commands, a lot of memory is required, so the program will be killed. what should be done to fix this problem?

regards Xlong Hu