GaloisInc / BESSPIN-Tool-Suite

The core tool of the BESSPIN Framework.
Other
5 stars 2 forks source link

Run FreeRTOS on Michigan:chisel_p1 on AWS:firesim #350

Closed rtadros125 closed 4 years ago

rtadros125 commented 4 years ago

@austinharris @taru-verma @rod-chapman has tried to run the tool on the Michigan binaries which were recently uploaded, but nothing is coming out of the UART. I am not even sure if they were supposed to boot, given that we use the Vanilla Clang compiler. Please advise.

rod-chapman commented 4 years ago

Ah yes... given a vanilla clang, I have no idea what might happen...

@austinharris @taru-verma - I'll be working on integrating your binaries with our tooling. What success (or otherwise) have you had with building and running FreeRTOS with your clang and target?

Oh.. I am based in the UK, so in UTC+1 timezone, so my hours are a little weird... are you both based in Ann Arbor?

taru-verma commented 4 years ago

@rod-chapman I think @austinharris might be able to give you a better idea as to how everything should be put together and run but I can talk about building with clang.

I branched off of FreeRTOS develop from before FatFS and Clang support were added and I manually integrated the changes for the latter. We have our clang with extensions for the Morpheus platform built. I can build and run examples like main_blinky and main_tcp as well as our app M3DB on the 32-bit P1 target (there are some issues with the TCP stack which I'm working on). I've tried the AGFI provided via the instructions in CloudGFE and the AGFIs which Austin is developing and they work.

Also I'm based in Ann Arbor (UTC-4) but I think Austin is in TX (UTC-5).

austinharris commented 4 years ago

@rtadros-Galois no vanilla binaries do not work. Taru can upload his working freertos binaries.

rtadros125 commented 4 years ago

Ok got it. This explains it. When you manage to build our app using your toolchain, please open a PR on SSITH-FETT-Binaries. Please add it to the path SSITH-FETT-Binaries/Michigan/osImages/firesim/FreeRTOS.elf, and assign me as a reviewer so I can merge it. Thank you!

rod-chapman commented 4 years ago

OK. I read your paper from ASPLOS 2019 so I have some idea what to expect. If you have trouble building our app (main_fett.c) then drop us a line here.

rtadros125 commented 4 years ago

Per the discussion in #385, this seems to be ready.

rtadros125 commented 4 years ago

Per our meeting today, Michigan will be using their own app stack. I will follow up with more details when I start running it myself.

rtadros125 commented 4 years ago

@taru-verma @austinharris I did the following:

Script done on Thu 18 Jun 2020 09:48:55 PM UTC



And it disappears right away. Please advise. Thanks.

Also, the documentation shows using the linux server. Is there any documentation on using the FreeRTOS app?
austinharris commented 4 years ago

Did you run the setup.sh script from the instructions here: https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/blob/develop/FireSim/minimal_cloudgfe.md ?

taru-verma commented 4 years ago

I tried FreeRTOS.elf (downloaded from the repo) and it works fine for us. Please let us know if running setup.sh solves the issue, thanks!

I usually run the app with main_tcp.img and main_tcp.dwarf from the example Cloud GFE binaries, but I don't think it matters. I ran them just now with blinky ones and it ran fine.

rtadros125 commented 4 years ago

I had already run setup.sh; sorry I forgot to mention it. Any other thing I should be aware of?

I am launching a fresh instance now to try again. Can you please give me the AMI ID you're using? Can you please also try it on a fresh instance to double check that we're not missing a config step?

taru-verma commented 4 years ago

I don't think so; that's all the steps I did. This should start the M3DB app. It will say that it's opening the database, which should take ~3 minutes. It'll then start the TCP server.

You need to set the tap0 interface up (the script doesn't bring it up automatically) - sudo ip link set tap0 up. And then you ping it with the IP address shown over UART, which will be 172.16.0.2. Once you can successfully ping it, you can open http://172.16.0.2:9443/help in a browser; I use lynx terminal browser but Firefox should work fine as well (Chrome might have issues).

I ran it on AGFI=agfi-06cc46557aca249fb (same as the one in agfi_id.json).

rtadros125 commented 4 years ago

I got the same error using fett.py, which gets the tap0 up.

Which AMI are you using?

taru-verma commented 4 years ago

It is the default FPGA 1.6.0.

austinharris commented 4 years ago

@rtadros-Galois are you able to run with just Dylan's default instructions and stock agfi?

rtadros125 commented 4 years ago

@austinharris I just ran it on the same exact terminal, and blinky ran fine.

rtadros125 commented 4 years ago

I will start now a fresh instance, and will try it, and attach the bash history.

taru-verma commented 4 years ago

@rtadros-Galois I tried everything on a fresh instance. For me, blinky worked, but our app didn't. It failed even before yours did.

Then, I copied over a version of s3://firesim-localuser/swpkgs/firesim-cloudgfe-chisel-p1-sw.tgz from a few days ago (the one which works for us) to the new instance, and it works - our app launches.

This seems to be an issue in the firesim-cloudgfe-chisel-p1-sw package. I'm putting the working version in the S3 bucket. You can copy it over, extract it (following the same instructions), and then if blinky works, try FreeRTOS.elf with our AGFI.

It's available at s3://morpheus-toolchain-share/firesim-cloudgfe-chisel-p1-sw.tgz. Please let me know if this works, thanks!

rtadros125 commented 4 years ago

I have created a fresh f1.2xlarge based on ami-02b792770bf83b668 which FPGA 1.6.0. And you can see my whole terminal here. Since saying yes for the fresh ssh. And I am getting the same error. My conclusion is: Either you guys did something extra, or I am using the wrong files. I ran md5sum to each file in the directories I used. If you find a discrepancy, please let me know so I can change the file and re-try. Thank you.

mich-test-full-terminal.txt

rtadros125 commented 4 years ago

Oh. Now this makes more sense. The wrong binaries were committed on the binaries repo then.

taru-verma commented 4 years ago

I just found out that the firesim-cloudgfe-chisel-p1-sw package seems to have been changed in the past few days. The binaries committed to the repo were from the previous version which is working and has been put into an S3 bucket for you to use as per my last message.

I copied over the latest version of firesim-cloudgfe-chisel-p1-sw ~30 minutes ago into a new instance (which has the default binaries - we made no changes). And that doesn't work either - my terminal matches yours until the line TraceRV: Warning: No +tracefileN given! and we have nothing after that (blank).

The S3 version which I just put up should work fine - that has the correct binaries. You can use that with FreeRTOS.elf. I was able to get it to run on a fresh instance.

All the best!

rtadros125 commented 4 years ago

The AGFI in the bucket you just pointed to is different from the on on FETT-Binaries too. Is this the last one?

taru-verma commented 4 years ago

Nope, the AGFI needs to be replaced with agfi-06cc46557aca249fb (the one you tried with).

This is just the default version from a while back.

rtadros125 commented 4 years ago

I see m3db_run now. I will create a PR on SSITH-FETT-Binaries today, and assign you as reviewers. Then, tomorrow I will work on running tests on your m3db app so that we can be confident that our tool base is able to boot your application properly, and that networking is all set when the competition start. I will also add you guys as reviewers when I open a PR.

taru-verma commented 4 years ago

That's awesome, thanks for the update Ramy! I'm not sure why the latest version of the Chisel P1 Software package doesn't work, but the one in the S3 bucket should be fine.

You can also use the toolchain that I shared earlier to build your own apps to run on our platform. The generated ELF needs to be encrypted with the elf-parser tool in there before running it (that's already been done for FreeRTOS.elf) - the instructions in the toolchain make it clear.

For now I've just tested some examples in the FreeRTOS directory. I'm not aware of what the FETT platform is or how it is used, but you should be able to build it with the toolchain, encrypt the ELF, and run with this software package.

I updated the meeting notes file on Google Drive with some more details of what exactly the app is since there seemed to be some confusion today earlier in the meeting.

Let me know if you face any other issues, thanks!

rtadros125 commented 4 years ago

The bad news is that I was unable to ping, curl, or lynx to the instance. But I will leave this to tmw because I still have to finish other things today. I will debug and ping you if I have questions.

You should ask Dylan on the differences. I assume your version includes his updates including the RNG. But not his updates including the GDB support. Anyway, we have anticipated this, and that's why we asked you guys to include the binaries generated during the AFI build, because we noticed that each set of files work together, but not always interchangeably.

taru-verma commented 4 years ago

Yeah let us know whenever you get a chance to look at it.

You need to bring up the tap0 instance - sudo ip link set tap0 up - since the script doesn't do it automatically - that may or may not have been fixed in the latest versions. With this you should be able to ping it.

curl will print garbage output since the data is sent in some binary format. I add HTTP headers in the beginning, so any browser (such as lynx) should be able to interpret it and display the webpage at http://172.16.0.2:9443/help.

Yeah we did add the binaries, and this is the rest of the SW package. Not sure what breaks where but I can look into it from our end and update you.

rtadros125 commented 4 years ago

Can i get a link to this google drive document?

I opened the binaries PR. The only file that changed is the Firesim-f1 binary.

taru-verma commented 4 years ago

https://docs.google.com/document/d/1te2kDUOFEBzgE5DdL2swxhb9SN1uwkwIErmJfOS5nSA/edit

Okay. In the meanwhile I'll try and build everything (our toolchain and the app) with changes for the new features like RNG and try and run it with the latest Chisel P1 SW Package. I'll let you know how it goes.

rtadros125 commented 4 years ago

@taru-verma

  1. Please approve DARPA-SSITH-Demonstrators/SSITH-FETT-Binaries#33 when you see this message.
  2. I was able to ping and see the http page. I was able to see the help page too, but when I tried to do a query, it died. Is this normal?
  3. For logging during the bugbounty, you'll get all UART output, which shall contain every query sent based on how I see the debugging thing you have there. If you have a simple command/query to fetch the whole database in the end, and you think this would be useful, I can run this before dropping the instance and include it in the artifacts. Please lmk.
rtadros125 commented 4 years ago

@taru-verma I am sorry, but this doesn't seem to be working. When I lynx any query, it just dies. I see on the UART output for query lynx "http://172.16.0.2:9443/query?avg(recenttravel)%20where%20(zipcode==48103%20&&%20reqo2==1)":

DEBUG: fetch page @ `/query?avg(recenttravel)%20where%20(zipcode==48103%20&&%20reqo2==1)'
DEBUG:  pre-rem: `avg(recenttravel)%20where%20(zipcode==48103%20&&%20reqo2==1)'...
DEBUG: post-rem: `avg(recenttravel) where (zipcode==48103 && reqo2==1)'...
DEBUG: tokenizing `avg(recenttravel) where (zipcode==48103 && reqo2==1)'...

Or the other example lynx "http://172.16.0.2:9443/query?avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender==\"F\"))", from a different run:

DEBUG: fetch page @ `/query?avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender=="F"))'
DEBUG:  pre-rem: `avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender=="F"))'...
DEBUG: post-rem: `avg(recovered) where (reqo2==1 && (reqvent==0 || gender=="F"))'...
DEBUG: tokenizing `avg(recovered) where (reqo2==1 && (reqvent==0 || gender=="F"))'...

And then nothing. Dead. It doesn't even respond to pings after that, and lynx stays Waiting for response (I waited 10 minutes). Are you sure I have the working binaries? Did you try to get queries on the fresh instance you created? Please double check. Thanks.

rtadros125 commented 4 years ago

I think I figured out the adaptor problem you had with the script. I think the script does get the adaptor up, but too early for your processor. If we put a delay after the processor boots, so that the network hook is ready to detect the networkUp event, and after that we get the adaptor up, it works fine. I was able to integrate your app into fett.py. My plan is to hand the researcher an instance only if:

All of this was already integrated and seems to work perfectly. Except the queries. Please lmk if you have any comments. You can check the integration work on the branch integrate-michigan-p1.

taru-verma commented 4 years ago

Thanks for the detailed description of the issues! I'm gonna look into why the queries are crashing. In the meanwhile, I have also built our toolchain with the latest version of the Chisel P1 SW Package and our app seems to be running with all the new hardware additions.

I'll test it out on a fresh instance as well and push the binaries to the repo for you try. I'll also provide the updated toolchain with support for the new hardware changes later today.

rtadros125 commented 4 years ago

Awesome Thanks. Finishing the integration work is of higher priority. I will wait for your PR on the binaries repo, please request my review when you open it so I can get a notification. Thanks.

taru-verma commented 4 years ago

@rtadros-Galois I've pushed the binaries incorporating latest hardware changes (RNG, encryption) to the umich_binaries branch of SSITH-FETT-Binaries. This should work with the latest Chisel P1 SW Package. I tested that on a fresh instance as well.

Still looking into the query issue, I'll push an update once that is done.

Will update the toolchain soon as well, thanks!

taru-verma commented 4 years ago

@rtadros-Galois SQL queries are working - a malloc had not been changed to pvPortMalloc which was causing issues. Please let me know if you're able to integrate this binary. I'll share the updated toolchain shortly as well. Thanks!

rtadros125 commented 4 years ago

@taru-verma Still not working for me. However, the umich-binaries branch's history seems messed up. Please tar all the binaries that are working for you plus the new FreeRTOS.elf, and update them to the S3 bucket, and send me the link. I will take care of our binaries repo no worries. And I will delete the branch. Thanks.

taru-verma commented 4 years ago

@rtadros-Galois Binaries exported at s3://morpheus-toolchain-share/binaries_export_06-19-20.tar.gz. Please let me know if this works for you, thanks!

rtadros125 commented 4 years ago

@taru-verma better, thanks. I am getting:

[nix-shell:~/target-fett]$ lynx -source "http://172.16.0.2:9443/query?avg(recenttravel) where (zipcode==48103 && reqo2==1)"
<!DOCTYPE html>
<html><head><title>HTML Document</title><style>.tab { tab-size: 2 }</style></head><body><pre class="tab"><p>
{
    "avg(RecentTravel)": "0.508021390374332"
}
</p></body></html>

However, I am getting this:

[nix-shell:~/target-fett]$ lynx -source "http://172.16.0.2:9443/query?avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender==\"F\"))"
<!DOCTYPE html>
<html><head><title>HTML Document</title><style>.tab { tab-size: 2 }</style></head><body><pre class="tab"><p>
{
    "error": "ERROR: SQLite SQL parse error"
}
</p></body></html>

Can I get another working example? Thanks.

rtadros125 commented 4 years ago

Here's the debug output:

DEBUG: fetch page @ `/query?avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender=="F"))'
DEBUG:  pre-rem: `avg(recovered)%20where%20(reqo2==1%20&&%20(reqvent==0%20||%20gender=="F"))'...
DEBUG: post-rem: `avg(recovered) where (reqo2==1 && (reqvent==0 || gender=="F"))'...
DEBUG: tokenizing `avg(recovered) where (reqo2==1 && (reqvent==0 || gender=="F"))'...
INFO: successful tokenization...
DEBUG: token[0]: {id:Token_AVG, val:avg}
DEBUG: token[1]: {id:Token_OPEN, val:(}
DEBUG: token[2]: {id:Token_field, val:Recovered}
DEBUG: token[3]: {id:Token_CLOSE, val:)}
DEBUG: token[4]: {id:Token_WHERE, val:where}
DEBUG: token[5]: {id:Token_OPEN, val:(}
DEBUG: token[6]: {id:Token_field, val:ReqO2}
DEBUG: token[7]: {id:Token_EQ, val:==}
DEBUG: token[8]: {id:Token_num, val:1}
DEBUG: token[9]: {id:Token_AND, val:&&}
DEBUG: token[10]: {id:Token_OPEN, val:(}
DEBUG: token[11]: {id:Token_field, val:ReqVent}
DEBUG: token[12]: {id:Token_EQ, val:==}
DEBUG: token[13]: {id:Token_num, val:0}
DEBUG: token[14]: {id:Token_OR, val:||}
DEBUG: token[15]: {id:Token_field, val:Gender}
DEBUG: token[16]: {id:Token_EQ, val:==}
DEBUG: token[17]: {id:Token_string, val:"F"}
DEBUG: token[18]: {id:Token_CLOSE, val:)}
DEBUG: token[19]: {id:Token_CLOSE, val:)}
INFO: REST request successfully parsed...
DEBUG: emitf = `select avg(Recovered), count(Recovered) from COVID19Data where (ReqO2 == 1 AND (ReqVent == 0 OR Gender == "F" ))'...
ERROR: SQL command error: no such column: F
DEBUG: emitRespbuf = `<!DOCTYPE html>
<html><head><title>HTML Document</title><style>.tab { tab-size: 2 }</style></head><body><pre class="tab"><p>
{
    "error": "ERROR: SQLite SQL parse error"
}
</p></body></html>
taru-verma commented 4 years ago

Hmmm not sure why it's giving the "no such column error", might have something to do with passing the argument in the query.

Both examples on the help page should work; you can also try something like "http://172.16.0.2:9443/query?avg(recovered) where (zipcode == 48105)".

rtadros125 commented 4 years ago

Thanks. This one works fine with me. If lynx returns 0, and the stdout does not have the keyword ERROR or error, I will consider this as PASS. Agree?

taru-verma commented 4 years ago

Yeah, I think so. I can check with Todd on any other PASS/FAIL conditions and update you if there's a change.

Also, feel free to delete the umich_binaries branch if it's messed up or has outlived its utility. We can communicate on how to share any future versions of the binaries.

Any idea on when the FreeRTOS development would be frozen? I can then merge those changes and create updated versions of the binary and toolchain (we make tiny changes to the BSPs since we added new instructions).

rtadros125 commented 4 years ago

Please consider it frozen. Any new updates will be irrelevant to your processor, and most probably surrounded by ifdefs that you won't define. I will let you know in case this changes, which is unlikely. Thanks.

I will repeat my question about logging. If you don't need anything other than the UART debug log. I can go ahead and close #233.

taru-verma commented 4 years ago

The UART debug log should be sufficient, and we don't need anything else. That issue can be closed, thanks!

taru-verma commented 4 years ago

@rtadros-Galois I've fixed something and uploaded a new version of the M3DB binaries at s3://morpheus-toolchain-share/m3db-binaries.tar.gz (only main_m3db.elf has been changed). In addition to the two sample queries on the help page, you can either create your own or try other examples like: lynx "http://172.16.0.2:9443/query?avg(recovered) where (agerange == \"0-9\" && race == \"B\" && gender == \"M\")" lynx "http://172.16.0.2:9443/query?avg(recovered) where (race == \"W\" && gender == \"F\")" http://172.16.0.2:9443/query?avg(recovered) where (zipcode == 48105)

I've also made available the Michigan toolchain, FreeRTOS-10.0.1 (with all changes integrated as of 06-17-20), and a README on how to build and run apps on our platform. It is available at s3://morpheus-toolchain-share/morpheus-toolchain-06-21-20.tar.gz. Please note that the AGFI used for apps built through our toolchain would be different than the one available in the M3DB binary tarball as they use different encryption keys. The toolchain README has information on which one to use.

Please let me know if you have any questions, thanks!

rtadros125 commented 4 years ago

@taru-verma I've updated the reference in #457. I tested with the two queries I mentioned before, and they ran fine.

@immindich FYI the above comment. Especially the bucket with the updated toolchain.