Deniz-Eren / dev-can-linux

Porting of Linux CAN-bus drivers to QNX
GNU General Public License v2.0
5 stars 1 forks source link

Support QNX Standard MID encoding #13

Closed phicore closed 11 months ago

phicore commented 11 months ago

Hi,

We use the driver under QNX 7.1 with a Toradex IMX8QM SOM with a MiniPCIE PEAK CAN board.

One issue we had is when using standard MID the MID were not ok for us, it was ok for extended mode, but not for standard.

On QNX the encoding for the MID trough the standard devctl depends whether or not the driver is using extended MIDs:

In standard 11-bit MIDs, bits 18–28 define the MID. In extended 29-bit MIDs, bits 0–28 define the MID.

Check http://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.utilities/topic/c/canctl.html

Beside this, to work on our board we had to use the event mode and also correct a pci configuration space offset if I recall correctly.

Deniz-Eren commented 11 months ago

Hello!

That's a good find, thank you. Give me a couple days to look into this a little deeper.

So far I've only used extended MID applications thus why I never came across this issue, so your feedback is appreciated.

When you say you needed to use the event mode, can you please elaborate a little more for my understanding?

Best regards, Deniz

Deniz-Eren commented 11 months ago

Also can you please post the correction you needed to make for the pci configuration space offset?

It could help if you post the output of running the driver with options "-vvvvv" after building in debug mode (cmake -DCMAKE_BUILD_TYPE=Debug); otherwise 5 v's won't be allowed with a release version. With that output I can see the hardware device specific numbers etc.

phicore commented 11 months ago

The output of the driver startup in verbose debug is the following

dev-can-linux v1.0.26
dev-can-linux comes with ABSOLUTELY NO WARRANTY; for details use option `-w'.
This is free software, and you are welcome to redistribute it
under certain conditions; option `-c' for details.
driver start (version: 1.0.26)
Auto detected device (1c:8) successfully: (driver "peak_pci")
pci_enable_device: 1c:8
read ssvid: 1c
read ssid: 2
read cs: 0, slot: 0, func: 0, devfn: 0
read capability[0]: 1
read capability[1]: 5
read capability[2]: 10
read ba[0] MEM { addr: 62000000, size: 10000 }
io threshold: 0; I/O[0:0], MEM[62000000:62010000]
read ba[1] MEM { addr: 62010000, size: 10000 }
io threshold: 0; I/O[0:0], MEM[62000000:62020000]
read irq[0]: 105
pci_request_regions
probing device 001c:0008:0000
pci_iomap; bar: 0, addr: 62000000, max: 1000
ba[0] mapping successful
pci_iomap; bar: 1, addr: 62010000, max: 400
ba[1] mapping successful
1x CAN FW v1.3.0
setup_timer (2911a9f7f0)
netif_carrier_off
register_netdev: peak_pci-can0
setting BTR0=0x07 BTR1=0x14
  clock: 8000000Hz
  bitrate: 125000bits/second
  sample_point: 875 (1/10 of percent)
  tq: 500ns (TQ)
  prop_seg: 6
  phase_seg1: 7TQ
  phase_seg2: 2TQ
  sjw: 1TQ
  brp: 4
netif_carrier_on
request_irq; irq: 105, name: peak_pci-can0
netif_start_queue
resmgr_attach -> 0
resmgr_attach -> 1
peak_pci-can0 at reg_base=0x4cec2d0000 cfg_base=0x4cec2c0000 irq=105

We set CONFIG_QNX_INTERRUPT_ATTACH to 0 and CONFIG_QNX_INTERRUPT_ATTACH_EVENT to 1

and in the /kernel/driver/net/can/sja1000/peak_pci.c file

-587,14 +587,16 @@ static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
    if (err)
        goto failure_disable_pci;

-   err = pci_read_config_word(pdev, 0x2e, &sub_sys_id);
+   // On QNX, the device-specific PCIe configuration space range starts at
+   // 0x40.
+   err = pci_read_config_word(pdev, 0x40 + 0x2e, &sub_sys_id);
    if (err)
        goto failure_release_regions;

    dev_dbg(&pdev->dev, "probing device %04x:%04x:%04x\n",
        pdev->vendor, pdev->device, sub_sys_id);

-   err = pci_write_config_word(pdev, 0x44, 0);
+   err = pci_write_config_word(pdev, 0x40 + 0x44, 0);

Some other files are changed see zip file dev-can-linux.zip

overmighty commented 11 months ago

There might have been a mistake in the changes done to src/kernel/drivers/net/can/sja1000/peak_pci.c. The layout of the first 0x40 bytes of the PCI configuration space being standardized and the rest being device-specific is true no matter the operating system. The reason for pci_device_cfg_rd16() not accepting offsets below 0x40 seems to be that the standardized registers must be read from using the dedicated pci_deviceread*() functions. Adding 0x40 to the offsets given to pci_write_config_word() seems wrong: the debug message probing device 001c:0008:0000 shows that 0 was read as subsystem ID, contradicting the earlier read ssid: 2 line from pci_enable_device(), which uses pci_device_read_ssid() and stores the result in dev->subsystem_device: https://github.com/Deniz-Eren/dev-can-linux/blob/db1683b66218b3ccdb880475669a3e8341630fd2/src/pci.c#L210-L217

I would try these changes instead:

diff --git a/src/kernel/drivers/net/can/sja1000/peak_pci.c b/src/kernel/drivers/net/can/sja1000/peak_pci.c
index 6b9ea39..ff9c96c 100644
--- a/src/kernel/drivers/net/can/sja1000/peak_pci.c
+++ b/src/kernel/drivers/net/can/sja1000/peak_pci.c
@@ -587,12 +587,8 @@ static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
        if (err)
                goto failure_disable_pci;

-       err = pci_read_config_word(pdev, 0x2e, &sub_sys_id);
-       if (err)
-               goto failure_release_regions;
-
        dev_dbg(&pdev->dev, "probing device %04x:%04x:%04x\n",
-               pdev->vendor, pdev->device, sub_sys_id);
+               pdev->vendor, pdev->device, pdev->subsystem_device);

        err = pci_write_config_word(pdev, 0x44, 0);
        if (err)
Deniz-Eren commented 11 months ago

@phicore and @overmighty thank you for your feedback and contribution! It is greatly appreciated.

I have created a working branch for the peak_pci driver fixes here: https://github.com/Deniz-Eren/dev-can-linux/tree/fix-subsystem-device-peak-pci

@phicore thank you for the implementation of functions pci_read_config_word and pci_write_config_word. I have committed these to this working branch and commented your contribution in the commit message.

Regarding peak_pci.c changes that @phicore needed to do to get the peak card working, it could well be that @overmighty is right. That is regarding: https://github.com/Deniz-Eren/dev-can-linux/blob/db1683b66218b3ccdb880475669a3e8341630fd2/src/kernel/drivers/net/can/sja1000/peak_pci.c#L590-L592

@phicore had done:

    // On QNX, the device-specific PCIe configuration space range starts at
    // 0x40.
    err = pci_read_config_word(pdev, 0x40 + 0x2e, &sub_sys_id);

I have applied @overmighty your recommendation of: https://github.com/Deniz-Eren/dev-can-linux/blob/90e432c54db5e82c7041fb9c54ac347a696a5d00/src/kernel/drivers/net/can/sja1000/peak_pci.c#L593-L603

Definitely it seems the first call of pci_device_read_ssid did return a different value, so @phicore please try this working branch changes and see if they help.

Also, @overmighty do you think the same applies to the pci_write_config_word application? That is: https://github.com/Deniz-Eren/dev-can-linux/blob/db1683b66218b3ccdb880475669a3e8341630fd2/src/kernel/drivers/net/can/sja1000/peak_pci.c#L597-L599

@phicore had done:

    err = pci_write_config_word(pdev, 0x40 + 0x44, 0);
    if (err)
        goto failure_release_regions;

I have changed this to be: https://github.com/Deniz-Eren/dev-can-linux/blob/90e432c54db5e82c7041fb9c54ac347a696a5d00/src/kernel/drivers/net/can/sja1000/peak_pci.c#L608-L616

The reason I have used pci_device_write_cmd is because looking at the Linux Kernel code, the function pci_write_config_word with values 0x44 seem like commands. In Linux there are:

/*
 * Under PCI, each device has 256 bytes of configuration address space,
 * of which the first 64 bytes are standardized as follows:
 */
#define PCI_STD_HEADER_SIZEOF   64
#define PCI_STD_NUM_BARS        6       /* Number of standard BARs */
#define PCI_VENDOR_ID           0x00    /* 16 bits */
#define PCI_DEVICE_ID           0x02    /* 16 bits */
#define PCI_COMMAND             0x04    /* 16 bits */
#define  PCI_COMMAND_IO         0x1     /* Enable response in I/O space */
#define  PCI_COMMAND_MEMORY     0x2     /* Enable response in Memory space */
#define  PCI_COMMAND_MASTER     0x4     /* Enable bus mastering */
#define  PCI_COMMAND_SPECIAL    0x8     /* Enable response to special cycles */
#define  PCI_COMMAND_INVALIDATE 0x10    /* Use memory write and invalidate */
#define  PCI_COMMAND_VGA_PALETTE 0x20   /* Enable palette snooping */
#define  PCI_COMMAND_PARITY     0x40    /* Enable parity checking */
#define  PCI_COMMAND_WAIT       0x80    /* Enable address/data stepping */
#define  PCI_COMMAND_SERR       0x100   /* Enable SERR */
#define  PCI_COMMAND_FAST_BACK  0x200   /* Enable back-to-back writes */
#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */

So I guess the value 0x44 is a command that set PCI_COMMAND_PARITY and PCI_COMMAND_MASTER.

@phicore some other comments looking at your zip file you have uploaded.

Regarding changes in src/kernel/drivers/net/can/sja1000/sja1000.c:

    // ENGLAB CEC: Forcing 125Kbds, sampling point at 75%
    btr0 = 0x07;
    btr1 = 0x14;

See documentation of the driver in section Setting Bitrate. If you want to set particular btr0 and btr1 values you can do that using the command canctl and do not need to hardcode them in the driver.

Same regarding the changes made in src/resmgr.c, you can set the baud rate. One comment however, it looks like you are forcing the btr0 and btr1 to be particular values, however looking at QNX documentation for canctl it says these parameters should be specified -c freq[kKmM],bprm,ts1,ts2,sjw - do you think we really need functionality that allows forcing btr0 and btr1 directly at the driver level?

Looking at changes done in src/kernel/include/linux/io.h those seem like some special aarch64le architecture or x86 architecture checks are needed. I have merged support for aarch64 and rebased our working branch. Can you please check the peak_pci.c changes in this working branch, that stem from @overmighty's comments? I have not tested with aarch64le and do not have PEAK CAN card to test with unfortunately, so if you can test it would be great and we can rebase these changes to main branch to support your application going forward better.

Deniz-Eren commented 11 months ago

After merging Support for aarch64 architecture branch the ticket automatically closed, so re-opening to track remaining items.

Deniz-Eren commented 11 months ago

After merging Support QNX standard/extended MID configurations branch the ticket automatically closed, so re-opening to track remaining items. Original reported mishandling of standard MID messages bug should be addressed now.

Changes rebased to fix-subsystem-device-peak-pci branch pending verification by @phicore

Deniz-Eren commented 11 months ago

Looking at /root/qnx710/target/qnx7/usr/include/pci/pci.h I'm starting to think the solution perhaps might be like what @phicore had originally, by adding 0x40 to the addresses. Modifying peak_pci.c is not a good way forward, that should remain 100% the same as Linux version, the changes should be implemented within the functions pci_write_config_word() and pci_read_config_word().

QNX pci.h says:

/*
 * The following functions provide write access to the device dependent portion
 * of the 256/4096 byte PCI/PCIe configuration space from 0x40 to 0xFF/0xFFF.
 * The offset parameter must be aligned.
 * Pass NULL for the <val_p> parameter if you don't want the register value after
 * write returned
*/
pci_err_t pci_device_cfg_wr8(pci_devhdl_t hdl, uint_t offset, uint8_t val, uint8_t *val_p);
pci_err_t pci_device_cfg_wr16(pci_devhdl_t hdl, uint_t offset, uint16_t val, uint16_t *val_p);
pci_err_t pci_device_cfg_wr32(pci_devhdl_t hdl, uint_t offset, uint32_t val, uint32_t *val_p);
pci_err_t pci_device_cfg_wr64(pci_devhdl_t hdl, uint_t offset, uint64_t val, uint64_t *val_p);

The wording could be implying that we must add 0x40 to the "offset" when calling these functions. What do you think @overmighty ?

Deniz-Eren commented 11 months ago

Regarding the 0x40 offset issue for functions pci_read_config_word() and pci_write_config_word(), I now have 2 alternative solution branches/pull-requests, there are:

The architecture of this project is to port Linux CAN drivers to QNX, and as such the file peak_pci.c should not be modified and only implementations of pci_read_config_word() and pci_write_config_word() should be modified, such that the driver code is the same for all platform. This makes merging of new Linux changes to our project easier and more maintainable.

Therefore, I propose the solution to this issue is more aligned with what @phicore originally had so we should go with fix-subsystem-device-peak-pci-alternative. Also the QNX function comments for pci_device_cfg_rd() and pci_device_cfg_wr() do mention this 0x40 offset, but not clear what they mean; however at least we know adding 0x40 works from @phicore's testing and we see mention of this 0x40 in the QNX comments. At least it's not a total magic number; we can forward this question to QNX also nevertheless.

Any thoughts?

Deniz-Eren commented 11 months ago

@phicore everything in your zip file has been implemented now in main branch and fix proposal branches as mentioned above, except for following items:

Furthermore, I will start generating aarch64le and x86_64 release builds from now; once we are finalise the remaining items.

phicore commented 11 months ago

Thanks to @overmighty for chiming in. For information @overmighty was the one doing the changes in your driver. Since then, I found the issues with MIDs and wanted to inform about.

Setting of device_id_count in src/main :

We use the PEAK Pcie Card and your driver to create a 4th CAN interface in addition to the 3 flexcan we use from the iMX8QM. We had trouble adding it as a 4th CAN and this change was the way found to make it work. Normally, we would have loaded the driver with dev-can-linux -u id=3,rx=1,tx=1, with the change we launch it simply with dev-can-linux 3 &

To be complete, it seems your driver (or the changes we did to it) has still some issues to cohabit with the integrated FlexCan drivers. We get some strange additional spurious activity on the integrated FlexCan can interface when the dev-can-linux is loaded, even when we don't use it. With a stress test program that writes over the 4 cans at increasing rate (up to 500msg/s) we could not reproduce it, but we see the issue with a customer provided application. I cannot imagine how it is possible but by removing the peak card or simply not loading the driver the issue disappears.

Thanks for your work, I will give another tryout probably in 2 weeks from now.

phicore commented 11 months ago

In addition, about the baud rate, I was never able to get another bitrate than 250kbs by configuring the driver, that's why I hardcoded it at the lowest level. With configuration, the driver showed 125Kbds in the verbose output, but the interface definitively used 250kbds. Possibly there are moments when the baud rate changes cannot be applied.

Deniz-Eren commented 11 months ago

Thank you @phicore for the clarrification and yes thank you @overmighty for your work also.

There is one known issue regarding cards with PCIe capability 0x5 (MSI). Looking at your dev-can-linux -vvvvv output you had provided above, your card definitely supports this 0x5 (MSI) capability:

read capability[0]: 1
read capability[1]: 5     <---- THIS ONE
read capability[2]: 10

The problem you describe sounds pretty much like this issue.

To clarrify the situation a little, when using PCIe cards, we found (on an equivalent Linux platform) that when the MSI capability IRQs are available and when the driver does not use them, some issues arise. From what we can tell, the IRQ event is received before the chipset has data available to read (sometimes) and thus things get strange. The fix is believed to be the use the capability 0x5 (MSI) IRQs.

The good news is I have been working on a fix for this issue some time ago, but we have not received new hardware types that have MCI support to test it and complete it. The fix branch is this: Added PCI capability ID 0x5 (MSI) support.

Now that I see your board also have this MSI capability, I will review my solution in branch Added PCI capability ID 0x5 (MSI) support and make sure it'll work on your driver also. Currently the solution is around the adv_pci.c driver only, however now I see a better more generic solution can be made for peak_pci.c to benefit also.

As said, this MSI feature hasn't been tested yet and I'm still working on it.

I will keep working on this MSI branch and let you know when it's ready.

Deniz-Eren commented 11 months ago
  • btr0 and btr1

I had a feeling having a feature that allows forcing btr0 and btr1 at the program option level would be a good idea, so I'll implement this also. Will let you know when it's ready.

phicore commented 11 months ago

Thanks a lot for you explanation and you hard work, I will definitely make sure to test it out.

Deniz-Eren commented 11 months ago

No worries at all

overmighty commented 11 months ago

Looking at /root/qnx710/target/qnx7/usr/include/pci/pci.h I'm starting to think the solution perhaps might be like what @phicore had originally, by adding 0x40 to the addresses. Modifying peak_pci.c is not a good way forward, that should remain 100% the same as Linux version, the changes should be implemented within the functions pci_write_config_word() and pci_read_config_word().

QNX pci.h says:

/*
 * The following functions provide write access to the device dependent portion
 * of the 256/4096 byte PCI/PCIe configuration space from 0x40 to 0xFF/0xFFF.
 * The offset parameter must be aligned.
 * Pass NULL for the <val_p> parameter if you don't want the register value after
 * write returned
*/
pci_err_t pci_device_cfg_wr8(pci_devhdl_t hdl, uint_t offset, uint8_t val, uint8_t *val_p);
pci_err_t pci_device_cfg_wr16(pci_devhdl_t hdl, uint_t offset, uint16_t val, uint16_t *val_p);
pci_err_t pci_device_cfg_wr32(pci_devhdl_t hdl, uint_t offset, uint32_t val, uint32_t *val_p);
pci_err_t pci_device_cfg_wr64(pci_devhdl_t hdl, uint_t offset, uint64_t val, uint64_t *val_p);

The wording could be implying that we must add 0x40 to the "offset" when calling these functions. What do you think @overmighty ?

This is what I had initially thought too when reading the QNX documentation, but in hindsight I think it doesn't make sense. The PCI configuration space is 256 bytes and the PCI Express configuration space is 4096 bytes, whereas the range from 0x40 to 0xFF/0xFFF is only 192/4032 bytes. Adding 0x40 to the offsets means losing access to the last 0x40 bytes of the configuration space which the PCI Express specifications state nothing about, while the first 0x40 bytes are known to be a standardized header. QNX seemingly differs from Linux by requiring access to standardized and non-standardized configuration space registers to be done using two separate sets of functions instead of providing a single set of functions to access both.

0x44 is seemingly the offset of a configuration space register specific to PEAK cards and not a command, or the Linux driver would use pci_config_write_word() to write into the command register at offset 0x04 instead of writing at offset 0x44.

Deniz-Eren commented 11 months ago

0x44 is seemingly the offset of a configuration space register specific to PEAK cards and not a command, or the Linux driver would use pci_config_write_word() to write into the command register at offset 0x04 instead of writing at offset 0x44.

Yes you are right, I realized that later on, thus branch Experimental proposal only for peak pci issue won't work for the 0x40+0x44 case.

Let's think from the other side, peak_pci.c is directly from Linux and the original functions pci_write_config_word() and pci_read_config_word() are from Linux. So when peak_pci.c does this:

pci_read_config_word(pdev, 0x2e, &sub_sys_id);
OR
pci_write_config_word(pdev, 0x44, 0);

And we know that these addresses 0x2e and 0x44 work only when you give 0x40 as an offset, then Linux peak_pci.c used Linux functions for write/read are referring to addresses after the 0x40 header. So the QNX version refers to header included, which means QNX allows access to the header but the Linux versions of the read/write functions used in peak_pci.c do not. So I think your original solution is valid.

Also some more quotes from /root/qnx710/target/qnx7/usr/include/pci/pci.h

 * These functions should only be used if the corresponding information cannot
 * be obtained with a pre-existing library or capability module API. Also check
 * the capability module API's defined in <pci/cap_*.h> files

I guess these functions are QNX internal functions normally, but their usage in dev-can-linux is OK because this project is bridging the Linux drivers with minimal changes to QNX.

So branch https://github.com/Deniz-Eren/dev-can-linux/pull/17 should be solution to this one I vote.

Deniz-Eren commented 11 months ago

Reopening again after merging bitrate feature https://github.com/Deniz-Eren/dev-can-linux/pull/18

With this option you can now manually specify 125k baud with BTR0 and BTR1 set:

dev-can-linux -u id=0,freq=125k,btr0=0x07,btr1=0x14
Deniz-Eren commented 11 months ago

I can also verify setting of device_id_count in the zip file version is a valid concern, I had not considered the possibility of other QNX CAN drivers taking up /dev/can#/ file descriptors. I will think of a way to integrate this shortly.

Deniz-Eren commented 11 months ago

Added new -U option to specify device id offset, i.e. the device_id_count, which I renamed to next_device_id, which is what it is really.

    -U base_id - First id to use for devices detected
                 e.g. /dev/can9/ is id=9
                 Default: 0
Deniz-Eren commented 11 months ago

I've changed the default configs to align with your preferred option also, seems like a better way to go:

config/CONFIG_QNX_INTERRUPT_ATTACH=0
config/CONFIG_QNX_INTERRUPT_ATTACH_EVENT=1
Deniz-Eren commented 11 months ago

Reopening after merging https://github.com/Deniz-Eren/dev-can-linux/pull/17 This is the best fix we have so far, I will post a Foundry27 post on this topic also.

Deniz-Eren commented 11 months ago

@phicore to get all the fixes and changes, which aligns with all the changes needed to match your zip file version, just test branch https://github.com/Deniz-Eren/dev-can-linux/pull/8

Also, note this version has the PCIe MSI 0x5 capability implementation (we are pending testing this on our side) so be aware of IRQ behaviour changes in that branch. Once tested I will merge these.

Deniz-Eren commented 11 months ago

Foundry27 post regarding https://github.com/Deniz-Eren/dev-can-linux/pull/17

Deniz-Eren commented 11 months ago

Response from QNX (Foundry27 post) validates our +0x40 solution is the correct way to convert from Linux to QNX convention when it comes to these pci read/write functions.

Deniz-Eren commented 11 months ago

Closing this ticket since all items have been implemented.

@phicore please note feature https://github.com/Deniz-Eren/dev-can-linux/issues/7 is not ready for testing, it was discovered feature https://github.com/Deniz-Eren/dev-can-linux/issues/4 is also needed for PCIe MSI 0x05 capability to be fully supported.

Please keep an eye out for those 2 tickets and continue with your testing using the main branch in the meantime.