aws / aws-fpga

Official repository of the AWS EC2 FPGA Hardware and Software Development Kit
Other
1.51k stars 516 forks source link

Using ARM processor in f1 FPGA instance #622

Closed Gogul-N closed 8 months ago

Gogul-N commented 1 year ago

Hi,

I have to connect my design with ARM processor and I have to create 2 AFIs with the same design. One is transmitter and another is receiver. But both have to be configured in a different way so that it will act as transmitter and receiver. But the thing is, register addresses of both transmitter and receiver are same since both AFIs have same offset address. Also, I want to do P2P transmission. How can I achieve this by configuring both registers with same addresses? Also is it possible to connect ARM to AWS Shell? And how to do P2P transmission? The example provided in github is little confusing.

Thank you.

czfpga commented 1 year ago

Hi @Gogul-N,

Can you please let m know which part in this page is confusing you?

It sounds like you have 2 AFIs with similar designs, register offsets, etc. In this case, you will need to launch, at least, a f1.4xlarge instance and load the two AFIs to 2 slots mapped in the same group. For example, transmitter AFI to Slot0 and receiver AFI to Slot1. It's unclear which interface does the design take to "transmit" or "receive". Assuming you can "visualize" it to use the PCIS/PCIM interfaces, you can do P2P transmission targeting at the peer card's 128GB BAR4 space that's dedicated to the P2P.

Regarding to ARM processor design, I'm assuming these are soft logic. It can certainly be implemented in the CL region as long as there is enough space for it to fit. I hope this helps. Please let me know if you still have questions.

Thanks,

Chen

Gogul-N commented 1 year ago

Hi @czfpga (Chen),

I am working with f1.4xlarge instance only. I am confused here, consider that same AFI is loaded in both slots. And the register I want to configure is 0x30041001. Slot 0 has to be configured with the value 0x11001001 and the slot 1 has to be configured with the value 0x11000001. How can I achieve this?

Edit: Also I have a doubt. Which port of the ARM processor has to be connected to shell? i.e., PCIM(BAR4).? Thank you.

czfpga commented 1 year ago

Hi @Gogul-N,

The two slots will have different BDFs. In this case, you will need to write different values to the same register address offset using different BDFs.

Regarding to the port, please note F1 does NOT expose any physical I/O pins to CL designs. Therefore, you must "virtualize" the ports by connecting them to the provided CL interface to emulate a I/O on the soft core. We have no visibility to your design. So you will need to make a decision on which need to be connected.

Thanks,

Chen

Gogul-N commented 1 year ago

Hi @czfpga (Chen),

Thank you for the reply. I will explore it regarding P2P using your guidelines.

Actually I have completed receiver validation using f1 instance. Now I have to connect the ARM processor to it. It is a softcore logic only. My doubt is, which BAR of Shell interface has to be connected to ARM processor soft logic? or with out shell interface can I talk to my CL through ARM or I have to talk to my CL through Shell to ARM to CL?

Thank you

czfpga commented 1 year ago

Hi @Gogul-N,

Interfacing with the soft core in CL have to be done through the shell. Depending on the type of interface, if the number of signals is small and they're very simple, like DIP switches, you can consider using the miscellaneous signals. If you need access to the core's control & status registers, you might consider using the OCL or BAR1 interface (AppPF, BAR0 or BAR1).

Thanks,

Chen

Gogul-N commented 1 year ago

Hi @czfpga (Chen)

Thank you so much for the quick reply. I will continue as per your suggestion.

Thank you.

Gogul-N commented 1 year ago

Hi @czfpga (Chen)

Blank diagram

This is my exact requirement. The signals A (10 bits) and B (10 bits) from Transmitter should be connected to A (10 bits) and B (10 bits) of the receiver. How can I achieve this? Also both transmitter and receiver will have ARM soft core connected to it.

Edit: I am using f1.4xlarge instance. I will take slot 0 as the transmitter and slot 1 as the receiver. As you suggested ARM soft core will be connected to BAR 0 or BAR 1. And here BAR 4 I am using it for configuring the registers in both transmitter and receiver.

Thank you.

czfpga commented 1 year ago

Hi @Gogul-N

Thank you for clarifying the application. As I mentioned before, F1 does NOT expose any physical I/O pins to CL designs.

So one option here is, if it's applicable, you can consider encapsulating data patterns on signal A/B to AXI-MM transactions and forwarding the transactions to the PCIM interface on the Tx instance. Rx will receive them from the PCIS interface and de-capsulate them before sending them to the receiver. However, this option might not work if A/B signals are expected to stream data all the time. F1 doesn't support any direct signal connection between slots.

Thanks,

Chen

Gogul-N commented 1 year ago

Hi @czfpga (Chen),

Shall I use AXI GPIO for transmitting the signals A, B as well as to receive it..? Actually, there will be continues data transmission. I mean, each clock cycle data will be changing.

Thank you.

czfpga commented 1 year ago

Hi @Gogul-N,

That will not work as you expect. All signals are handled by the Shell. And for P2P eventually all data will be encapsulated to PCIe packets in order to be transmitted to the P2P partner. There is NO direct GPIO connection between 2 slots.

Thanks,

Chen

Gogul-N commented 1 year ago

Hi @czfpga (Chen)

Thank you so much for your patience.

I have understood.

To communicate with the shell any how we need an axi interface from CL then how the signals A and B are connected to Shell without any interface to PCIM/PCIS. Here only I am confused a lot.

Thank you.

AWScsaralay commented 1 year ago

Hi,

As Chen stated previously, there cannot be any direct connections between A/B ports of TX to A/B of RX blocks. One possible way you could do is as shown below: image

However, transferring only 10-bits at a time between two FPGAs will be extremely slow due to PCIe overhead. If your goal is to synthesize a TX and RX block in the same FPGA then it might be simpler to just use a GPIO block or simple signaling/handshake between TX and RX blocks.

Please let us know if you have any questions.

Thanks! Chakra

Gogul-N commented 1 year ago

Hi @AWScsaralay ,

Thank you so much for your deep explanation. I try to synthesize both blocks in the same FPGA if there is enough space. If not, I will go with 2 FPGAs.

Thank you.

Gogul-N commented 11 months ago

Hi @czfpga and @AWScsaralay ,

Please look at the below image. How can I communicate with ARM soft IP through Shell. Kindly help me on this.

image

Edit: My CL has to be connected with ARM.

Thank you.

AWScsaralay commented 11 months ago

Hello,

The arm core should be part of your CL design. The AWS Shell provides various AXI-L and AXI4 interface, so depending on how you would like to access arm core, you may need some kind of convertor logic.

For example, you may use Shell's PCIS AXI4 --> AXI4-to-AXI3 Convertor --> Arm core. But it really depends on customers intended functionality.

Please let us know if you need any additional details.

-Chakra

Gogul-N commented 11 months ago

Hi @AWScsaralay ,

Thank you for your reply. I will try and update here.

Thank you.