Bumblebee-Project / bbswitch

Disable discrete graphics (currently nvidia only)
GNU General Public License v2.0
487 stars 78 forks source link

[Lenovo IdeaPad Z510, GT 740M] Enabling the GPU won't work after resuming from sleep #142

Closed Barteks2x closed 7 years ago

Barteks2x commented 7 years ago

I already confirmed it on 2 completely different distributions: Debian jessie (and debian testing some time ago) and gentoo with kernel versions 3.16.0 on debian and 4.4.26 on gentoo.

After suspending, using optirun from bumblebee will freeze the whole system for a few seconds on debian (I didn't test bumblebee on gentoo), manually setting ON on both systems (echo ON > /proc/acpi/bbswitch as root) has no effect and outputs the following in dmesg: Refused to change power state, currently in D3

The laptop is Lenovo ideapad Z510 with Nvidia GeForce 740M GPU.

Lekensteyn commented 7 years ago

Got a dmesg from the event? Have you tried a newer kernel, like 4.7?

Barteks2x commented 7 years ago

Currently I switched to nouveau, but it also seems to have the same issue - after suspend the whole system freezes (it's not just no display, even ssh stops working), unless I boot it with nomodeset (then it just doesn't freeze everything but still no display, and the same message in dmesg).

So this isn't a problem only with bbswitch.

I would like to avoid installing nvidia drivers again on gentoo unless needed, and I have debian stable so updating kernel may be problematic.

I can give full dmesg output from debian (with bumblebee). I will also try with newer kernel if I find a way to install it on debian.

Barteks2x commented 7 years ago

I got a solution here: https://devtalk.nvidia.com/default/topic/955952/linux/struggling-with-my-geforce-gt-635m-and-the-nvidia-linux-driver-/

Ths solution is: Add acpi_osi="!Windows 2013" to kernel command line for it to work with kernel versions >3.14.

Lekensteyn commented 7 years ago

Please upload your acpidump/lspci/dmidecode according to the instruction on https://bugs.launchpad.net/lpbugreporter/+bug/752542

You should normally not have to set acpi_osi. There are known issues with newer Maxwell devices, but you should not be affected by that.

Barteks2x commented 7 years ago

The file (I'm not sure if it's needed here): https://bugs.launchpad.net/lpbugreporter/+bug/752542/+attachment/4773050/+files/LENOVO-20287.tar.gz

Lekensteyn commented 7 years ago

Based on that table, I can see that _PR3 support in _OSC is disabled when !"Windows 2013". Additionally there are some changes in how the USB 3 port is managed (should not be significant here...). However, the most significant change is that use of power resources are also disabled with !"Windows 2013".

Do you have a dmesg for both boots (with and without the acpi_osi parameter)?

What kernel versions have you tried? There is a known issue with 4.8 and the conditional used in your ACPI table, but that can be worked around with pcie_port_pm=off (note: only applicable/effective in Linux 4.8).

Barteks2x commented 7 years ago

I was getting the same issues issues with kernels 3.16.0 on debian, and versions 4.4.26 and 4.8.4 on gentoo.

Without making changes to my setup (switching drivers), I can get dmesg from debian with nvidia drivers apparently I have nouveau+bumblebee there (3.16.0 kernel) and 4.8.4 kernel from gentoo with nouveau (I removed the 4.4.26 one). I will provide link to both once I get them.

Barteks2x commented 7 years ago

I was 100% sure I have nvidia drivers on debian. I don't remember switching to nouveau...

Anyway, here are dmesg outputs:

I will also try pcie_port_pm=off, but I don't think it will work as I had the same issues with older kernels.

Lekensteyn commented 7 years ago

Ok, investigating. Also note that since your BIOS date is before 2015, you do not need pcie_port_pm=off as that is already the default for you.

Lekensteyn commented 7 years ago

Ok, the power resources difference could indeed be significant here. In both versions, you have this line in the default situation:

acpi LNXPOWER:01: Turning OFF
acpi LNXPOWER:00: Turning OFF

maybe it will be fixed in 4.10 or 4.11 (depending on whether a solution is ready/tested). Are you willing to test possible patches (provided by the ACPI maintainers of Linux)?

Related links are: https://bugs.freedesktop.org/show_bug.cgi?id=98398 http://www.spinics.net/lists/linux-acpi/msg70063.html (thread starts at http://www.spinics.net/lists/linux-acpi/msg70022.html)

For future reference, this is the information from dmidecode.txt from https://github.com/Bumblebee-Project/bbswitch/issues/142#issuecomment-258635410:

system-manufacturer   : LENOVO
system-product-name   : 20287
system-version        : Lenovo IdeaPad Z510
baseboard-manufacturer: LENOVO
baseboard-product-name: VIUU4
baseboard-version     : 31900058STD
bios-vendor           : LENOVO
bios-version          : 8DCN40WW
bios-release-date     : 10/24/2014

Can you test this patch with Linux 4.8 (and no acpi_osi nor pcie_port_pm=off options, with nouveau but no bbswitch):

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..8628669 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2201,11 +2201,11 @@ static bool pci_bridge_d3_possible(struct pci_dev *bridge)
            return true;

        /*
-        * It should be safe to put PCIe ports from 2015 or newer
-        * to D3.
+        * It should be safe to put PCIe ports from 2014 or newer
+        * to D3 as Windows 8 started supporting this.
         */
        if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
-           year >= 2015) {
+           year >= 2014) {
            return true;
        }
        break;

EDIT: actually... given that you have a module-level If-block, you would hit the bug linked above. In that case you probably want to test this patch on top of v4.9 (currently still in -rc) which has this commit: https://git.kernel.org/linus/de56ba95e8d6d760910711744a548b50b3a4262d

Barteks2x commented 7 years ago

I tested this patch with version 4.9-rc2, no difference (at least no visible difference). But now I see some ACPI related errors in dmesg that I don't remember seeing before (that may be from wifi as firmware for it no longer loads). Here is dmesg output: http://pastebin.com/YiVrqG4T (again, after running suspend with nouveau unloaded)

Lekensteyn commented 7 years ago

If you are referring to the ACPI EC messages, these are normal and unrelated. The only new warning I see is The lid device is not compliant to SW_LID., but that is also unrelated (it was added in torvalds/linux@dfa46c50f65b6ca10cea389327a6f1f1749bc633).

I hoped to see nouveau: detected PR support, will not use DSM, but due to the above "module-level code" bug it is possible that it does not work.

Can you install the iasl tools (I have tested 20160930-64) and patch your ssd5.dsl with this (_BCM fix was needed to fix a compile error, the commenting out part is the interesting change, the OemID version bump is needed for Linux):

--- ssdt5.dsl   2016-11-06 17:51:06.364158711 +0100
+++ ssdt5-hacked.asl    2016-11-06 18:01:25.563924220 +0100
@@ -18,7 +18,7 @@
  *     Compiler ID      "ACPI"
  *     Compiler Version 0x00040000 (262144)
  */
-DefinitionBlock ("", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
+DefinitionBlock ("", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000002)
 {
     External (_PR_.CPU0, ProcessorObj)
     External (_PR_.CPU0._PPC, IntObj)
@@ -53,7 +53,7 @@
     External (_SB_.PCI0.GFX0.DD01._DGS, MethodObj)    // 0 Arguments
     External (_SB_.PCI0.GFX0.DD02._ADR, MethodObj)    // 0 Arguments
     External (_SB_.PCI0.GFX0.DD02._BCL, IntObj)
-    External (_SB_.PCI0.GFX0.DD02._BCM, IntObj)
+    External (_SB_.PCI0.GFX0.DD02._BCM, MethodObj)
     External (_SB_.PCI0.GFX0.DD02._BQC, IntObj)
     External (_SB_.PCI0.GFX0.DD02._DCS, MethodObj)    // 0 Arguments
     External (_SB_.PCI0.GFX0.DD02._DGS, MethodObj)    // 0 Arguments
@@ -1012,8 +1012,7 @@

             Method (_BCM, 1, NotSerialized)  // _BCM: Brightness Control Method
             {
-                Return (\_SB.PCI0.GFX0.DD02._BCM) /* External reference */
-                Arg0
+                \_SB.PCI0.GFX0.DD02._BCM (Arg0) /* External reference */
             }
         }

@@ -2327,8 +2326,7 @@
         }
     }

-    If (\_OSI ("Windows 2013"))
-    {
+    // If (\_OSI ("Windows 2013")) {
         Scope (\_SB.PCI0.PEG1)
         {
             Name (WKEN, Zero)
@@ -2488,6 +2486,6 @@
                 _STA = One
             }
         }
-    }
+    // }
 }

Then compile with iasl ssdt5-hacked.asl which should produce a ssdt5-hacked.aml file. See https://www.kernel.org/doc/Documentation/acpi/initrd_table_override.txt for the next steps. If you have difficulties with creating the AML file or initrd, let me know, then I can prepare an initrd for you.

Barteks2x commented 7 years ago

Um... first need to figure out how to actually make my system use initrd - I don't have initrd at all (it's not there in /boot and I didn't find it in grub.cfg). So it may take me a while. Or I'm just very confused.

Lekensteyn commented 7 years ago

@Barteks2x It should be sufficient to create an initrd with just that file:

mkdir -p kernel/firmware/acpi
cp ssdt5-hacked.aml kernel/firmware/acpi/
find kernel | cpio -H newc --create > acpi-initrd

then copy the initrd to some place (e.g. /boot/acpi-initrd) and modify grub.cfg as follows:

linux /boot/vmlinuz root=...
# new:
initrd /boot/acpi-initrd

I test this with QEMU, it seems to work, see the 00000002 in dmesg:

ACPI: SSDT 0x000000007FFDA000 003E43 (v01 LENOVO CB-01    00000002 INTL 20160930)
Barteks2x commented 7 years ago

I added that initrd file, and I saw similar message in dmesg, but it still didn't fix the issue, this is dmesg output: http://pastebin.com/LuTLQnU1

Or should I remove the previous patch first?

It's also possible that the patch got applied incorrectly, as I had slightly different version of iasl so the decompiled code wasn't exactly the same.

Lekensteyn commented 7 years ago

So, for some reason Lenovo has decided to give all tables the same name. The ACPI table override functionality matches possible candidates by signature, OEM ID and OEM Revision ID which are all the same. As a result, the wrong SSDT is overridden.

As a workaround, patch the kernel (drivers/acpi/tables.c) and add this line after the Only override tables matched block:

if (existing_table->checksum != 0xAF) {
    /* Note: the next line is missing in the second dmesg below, causing a WARNING. */
    acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
    pr_info("Skipping next table\n");
    goto next_table;
}

You have previously posted dmesgs at https://github.com/Bumblebee-Project/bbswitch/issues/142#issuecomment-258678472 and linked a new one after a discussion in #bumblebee at Freenode. For comparison:

NOTE: dmesg does not contain nouveau: detected PR support, will not use DSM (and pr_info messages are actually shown based on the acpi printouts), so pcie_port_pm=off has no effect. TODO: figure out why port PM is not supported?

Takeaway for reporter:

To-do for Linux ACPI developers (I will contact them later):

To-do for self:

Barteks2x commented 7 years ago

I tested the same SSDT override and tables.c patch with year patch on 4.8.4 kernel.

The result: it seems to work both with and without year patch, dmesg outputs:

This did not work however:

zetalog commented 7 years ago

The table headers are:

DefinitionBlock ("dsdt.aml", "DSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
DefinitionBlock ("ssdt1.aml", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
DefinitionBlock ("ssdt2.aml", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
DefinitionBlock ("ssdt3.aml", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
DefinitionBlock ("ssdt4.aml", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)
DefinitionBlock ("ssdt5.aml", "SSDT", 1, "LENOVO", "CB-01   ", 0x00000001)

The SSDTs will be deemed as different versions of the same table from linux ACPI's point of view. I'm actually about to change acpi_tb_compare_tables() to use the id comparison instead of the entire table comparison. And the change may affect behavior of loading these tables.

Let me ask further which behavior OSPM should take to these tables:

  1. Install only ssdt1.aml and stop installing ssdt2-5.aml because of ID matches, and load only ssdt1.aml into the namespace?
  2. Install ssdt1.aml, then override ssdt1.aml with ssdt2.aml, then override ssdt2.aml with ssdt3.aml, then override ssdt3.aml with ssdt4.aml, then override ssdt4.aml with ssdt5.aml because of ID matches, and load only ssdt5.aml into the namespace?
  3. Install ssdt1.aml,ssdt2.aml,ssdt3.aml,ssdt4.aml,ssdt5.aml and load all of them into the same namespace?

Can you confirm Windows behavior for this via amli?

zetalog commented 7 years ago
if (existing_table->checksum != 0xAF) {
    /* Note: the next line is missing in the second dmesg below, causing a WARNING. */
    acpi_os_unmap_memory(table, ACPI_HEADER_SIZE);
    pr_info("Skipping next table\n");
    goto next_table;
}

Maybe we need to extend table override mechanism to allow override one of the tables by matching checksum. Well, Lenovo guys are really crazy! How can these tables be identified then...

zetalog commented 7 years ago
-    If (\_OSI ("Windows 2013")) {
+    // If (\_OSI ("Windows 2013")) {

I haven't confirmed if _OSI working in module level. Maybe we need to move it a bit earlier in Linux before enabling MLC.

Lekensteyn commented 7 years ago

Maybe we need to extend table override mechanism to allow override one of the tables by matching checksum.

See also the "TODOs" at https://github.com/Bumblebee-Project/bbswitch/issues/142#issuecomment-258719818 :-) In a reply to your mail I also suggested a possible method that looks at the filename for overriding based on other attributes. (checksum and size are different between the original and override table, so you cannot use the header for comparison).

Well, Lenovo guys are really crazy!

That is already known by now, they have several deficiencies in the past with relation to hybrid graphics (and other topics like Superfish). For example, memory corruption... #78.

Based on a test with QEMU:

qemu-system-x86_64 -M pc,accel=kvm -kernel arch/x86/boot/bzImage -nographic \
    -serial file:/dev/stdout -append 'console=ttyS0 acpi.aml_debug_output=1' \
    -acpitable file=test.aml | grep Debug -iC5

it seems that _OSI is supported. With this ASL:

DefinitionBlock ("", "SSDT", 1, "TESTJE", "MYTESTJE", 0x00001000) {
    External (\_SB.PCI0.PX13, DeviceObj)

    Debug = "Module level"
    If (\_OSI ("Windows 2013")) {
        Debug = "In block"

        Scope (\_SB.PCI0.PX13) {
            Method (_S0W, 0, NotSerialized) {
                Debug = "_SOW"
                Return(3)
            }
        }
    }
}

I get this output on v4.9-rc4:

[    0.072031] ACPI: Added _OSI(Module Device)
[    0.072923] ACPI: Added _OSI(Processor Device)
[    0.073865] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.074861] ACPI: Added _OSI(Processor Aggregator Device)
[    0.076238] [ACPI Debug]  "In block"
[    0.077030] ACPI: Executed 2 blocks of module-level executable AML code
[    0.080149] ACPI: Interpreter enabled
[    0.080945] ACPI: (supports S0 S5)
[    0.081671] ACPI: Using IOAPIC for interrupt routing
[    0.082723] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.084084] [ACPI Debug]  "_SOW"
[    0.086642] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.088004] acpi PNP0A03:00: _OSC: OS supports [Segments]
[    0.089209] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM

and with acpi_gbl_parse_table_as_term_list set to TRUE:

[    0.072001] ACPI: Added _OSI(Processor Aggregator Device)
[    0.074047] [ACPI Debug]  "Module level" <------------------ NEW
[    0.074927] [ACPI Debug]  "In block"
[    0.075740] ACPI: 2 ACPI AML tables successfully acquired and loaded
[    0.077996] ACPI: Interpreter enabled
[    0.079003] ACPI: (supports S0 S5)
[    0.080002] ACPI: Using IOAPIC for interrupt routing
[    0.081590] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.084142] [ACPI Debug]  "_SOW"
[    0.089026] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
zetalog commented 7 years ago

Whatever the block is executed right in table loading:

[    0.074047] [ACPI Debug]  "Module level"
[    0.074927] [ACPI Debug]  "In block"

or deferred after loading table:

[    0.076238] [ACPI Debug]  "In block"

The block should have been executed:

    If (\_OSI ("Windows 2013"))
    {
        Scope (\_SB.PCI0.PEG1)
        {
            Name (WKEN, Zero)
            OperationRegion (PCNV, SystemMemory, \_SB.PCI0.PEG1.PEGP.EBAS, 0x1000)
            Field (PCNV, AnyAcc, NoLock, Preserve)
            {
                Offset (0x488), 
                    ,   25, 
                MLTF,   1
            }

            Name (_PR0, Package (0x01)  // _PR0: Power Resources for D0
            {
                NVP3
            })
            Name (_PR2, Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2
            })
            Name (_PR3, Package (0x01)  // _PR3: Power Resources for D3hot
            {
                NVP3
            })
            Method (_S0W, 0, NotSerialized)  // _S0W: S0 Device Wake State
            {
                If (And (\_SB.OSCO, 0x04))
                {
                    Return (0x04)
                }
                Else
                {
                    Return (0x03)
                }
            }

            Method (_DSW, 3, NotSerialized)  // _DSW: Device Sleep Wake
            {
                If (Arg1)
                {
                    Store (Zero, WKEN) /* \_SB_.PCI0.PEG1.WKEN */
                }
                Else
                {
                    If (LAnd (Arg0, Arg2))
                    {
                        Store (One, WKEN) /* \_SB_.PCI0.PEG1.WKEN */
                    }
                    Else
                    {
                        Store (Zero, WKEN) /* \_SB_.PCI0.PEG1.WKEN */
                    }
                }
            }

            Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
            {
            }

            Method (_PS3, 0, NotSerialized)  // _PS3: Power State 3
            {
            }
        }

        Name (MSD3, Zero)
        PowerResource (NVP3, 0x00, 0x0000)
        {
            Name (_STA, One)  // _STA: Status
            OperationRegion (PEGB, SystemMemory, \_SB.PCI0.PEG1.PEGP.DBPA, 0x0100)
            Field (PEGB, ByteAcc, NoLock, Preserve)
            {
                Offset (0x04), 
                PCMR,   8, 
                Offset (0x84), 
                PMST,   2
            }

            OperationRegion (DGRS, SystemMemory, \_SB.PCI0.PEG1.PEGP.EBAS, 0x50)
            Field (DGRS, DWordAcc, NoLock, Preserve)
            {
                DDID,   16, 
                Offset (0x40), 
                SSSV,   32
            }

            Method (_ON, 0, NotSerialized)  // _ON_: Power On
            {
                If (LEqual (DDID, 0xFFFF))
                {
                    Store (One, _STA) /* \NVP3._STA */
                    \_SB.PCI0.PEG1.PEGP.SGPO (\_SB.PCI0.PEG1.PEGP.HLRS, One)
                    \_SB.PCI0.PEG1.PEGP.SGPO (\_SB.PCI0.PEG1.PEGP.PWEN, Zero)
                    \_SB.PCI0.PEG1.PEGP.SGPO (0x34, Zero)
                    Sleep (0x10)
                    \_SB.PCI0.PEG1.PEGP.SGPO (\_SB.PCI0.PEG1.PEGP.HLRS, Zero)
                    Sleep (0x10)
                    Store (Zero, \_SB.PCI0.PEG1.LNKD)
                    While (LLess (\_SB.PCI0.PEG1.LNKS, 0x07))
                    {
                        Store (0x20, Local0)
                        While (Local0)
                        {
                            If (LLess (\_SB.PCI0.PEG1.LNKS, 0x07))
                            {
                                Stall (0x64)
                                Decrement (Local0)
                            }
                            Else
                            {
                                Break
                            }
                        }
                    }

                    Store (0x07, PCMR) /* \NVP3.PCMR */
                    Store (Zero, PMST) /* \NVP3.PMST */
                    While (LEqual (DDID, 0xFFFF))
                    {
                        Sleep (One)
                    }

                    Store (0x380A17AA, SSSV) /* \NVP3.SSSV */
                    Store (Zero, \_SB.PCI0.PEG1.PEGP.MLTF)
                    Store (Zero, MSD3) /* \MSD3 */
                    If (\ECON)
                    {
                        Store (One, \_SB.PCI0.LPCB.EC0.GATY) /* External reference */
                    }
                }
            }

            Method (_OFF, 0, NotSerialized)  // _OFF: Power Off
            {
                If (LEqual (MSD3, Zero))
                {
                    If (\ECON)
                    {
                        Store (0x03, \_SB.PCI0.LPCB.EC0.GATY) /* External reference */
                    }

                    \_SB.PCI0.PEG1.PEGP.SGPO (\_SB.PCI0.PEG1.PEGP.HLRS, One)
                    Sleep (0x02)
                    \_SB.PCI0.PEG1.PEGP.SGPO (0x34, One)
                    \_SB.PCI0.PEG1.PEGP.SGPO (\_SB.PCI0.PEG1.PEGP.PWEN, One)
                    Store (Zero, _STA) /* \NVP3._STA */
                    Store (0x03, MSD3) /* \MSD3 */
                }
            }
        }

        PowerResource (NVP2, 0x00, 0x0000)
        {
            Name (_STA, One)  // _STA: Status
            Method (_ON, 0, NotSerialized)  // _ON_: Power On
            {
                Store (One, _STA) /* \NVP2._STA */
            }

            Method (_OFF, 0, NotSerialized)  // _OFF: Power Off
            {
                Store (One, _STA) /* \NVP2._STA */
            }
        }
    }

So why is this change needed to fix the issue:

-    If (\_OSI ("Windows 2013")) {
+    // If (\_OSI ("Windows 2013")) {
zetalog commented 7 years ago

In a reply to your mail I also suggested a possible method that looks at the filename for overriding based on other attributes. (checksum and size are different between the original and override table, so you cannot use the header for comparison).

TBH, I think index of the table is not stable. For dynamic loading tables, index can be vary per OS boot. While checksum may be stable as they are likely not identical. We can disable table checksum validation for overridden tables and add an option in iasl to set checksum instead of calculating checksum for the compiled tables.

zetalog commented 7 years ago

Fix module-level AML codes, it still does not work as of v4.9-rc2

What do you want us to fix? We are about to enable acpi_gbl_parse_table_as_term_list = TRUE. However, we'll do that after clarifying a Windows behavior against automatic _REG execution.

If you looked at upstream ASLTS test cases, you would find a 'table' test suites, in which TLD0.tst9 was still broken. We need to confirm if the test is not valid or the auto _REG needs a better support.

Lekensteyn commented 7 years ago

We can disable table checksum validation for overridden tables and add an option in iasl to set checksum instead of calculating checksum for the compiled tables.

That is such a hack, I do not think it is a good idea because it will confuse other tools due to bad checksum. I think passing the matching parameters in the filename is a better idea. So instead of kernel/firmware/acpi/ssdt.aml you could have kernel/firmware/acpi/checksum=0xff,ssdt.aml and/or (for matching multiple attributes) kernel/firmware/acpi/checksum=0xff,size=1234,ssdt.aml.

What do you want us to fix?

I thought that MLC was related to this problem, but maybe there is another bug in how power resources are evaluated. See the tests below.

So why is this change needed to fix the issue:

-    If (\_OSI ("Windows 2013")) {
+    // If (\_OSI ("Windows 2013")) {

I do not know why it matters. I just ran some tests and using the QEMU command line and ASL from http://www.spinics.net/lists/linux-acpi/msg70069.html and using acpidbg -b Methods I verified that the methods inside the block are created:

# /tmp/acpidbg -b 'Methods' | grep NVP
                       \NVP3._ON Method       ffff880006325540 02 Args 0 Len 0014 Aml ffffc90000005133
                      \NVP3._OFF Method       ffff880006325578 02 Args 0 Len 0015 Aml ffffc9000000514e
                       \NVP2._ON Method       ffff880006325620 02 Args 0 Len 0014 Aml ffffc9000000517b
                      \NVP2._OFF Method       ffff880006325658 02 Args 0 Len 0015 Aml ffffc90000005196

but the power resource method itself is not visible in most cases:

# grep S90 /sys/bus/pci/devices/*/firmware_node/path
/sys/bus/pci/devices/0000:00:12.0/firmware_node/path:\_SB_.PCI0.S90_
# ls -d /sys/bus/pci/devices/*/firmware_node/power_resources*
ls: cannot access '/sys/bus/pci/devices/*/firmware_node/power_resources*': No such file or directory

only in one case (no If, term_list setting is false) I see the desired result:

/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D0
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D2
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D3hot

Tested scenarios with v4.9-rc4:

ASL acpi_gbl_parse_table_as_term_list Result
If(\_OSI("Windows 2013")) TRUE fail
If(1) TRUE fail
no If TRUE fail
If(\_OSI("Windows 2013")) FALSE fail
If(1) FALSE fail
no If FALSE pass

Very strange results, any idea why this would happen? I did not expect that term_list=TRUE would break the normal case where no If exists.

You said you do not have QEMU, what environment (OS/distro/hardware) are you using? QEMU is really useful for reproducing such tests with Linux.

zetalog commented 7 years ago

I think passing the matching parameters in the filename is a better idea.

I'll check the idea with other ACPI developers. But if you check Linux source code, the checksum validation mechanism has never been enabled. Because verifying checksum requires mapping of the whole table during the table installation (early) while x86 currently doesn't allow us to map giant tables during the early stage.

Very strange results, any idea why this would happen? I did not expect that term_list=TRUE would break the normal case where no If exists.

Is that possible to create ASLTS for this case?

You said you do not have QEMU

This is minor. I do have qemu environments to do Windows validation. It's just because I'm travelling, cannot run big windows VM on slow/low memory laptops. So if something is urgent, I have to ask the others to do it for me.

But qemu cannot tell us how Windows evaluates tables. I only saw the result once from the amli output you've provided to me. I don't have genuine Windows checked build images (actually it doesn't have to be genuine if we don't validate networking behavior :), I just do not have any such Windows software from my current employer) to build an amli environment.

zetalog commented 7 years ago

What's the result in the passed case of the following command:

# /tmp/acpidbg -b 'Methods' | grep NVP
zetalog commented 7 years ago

Just checking back the comments.

Ths solution is: Add acpi_osi="!Windows 2013" to kernel command line for it to work with kernel versions >3.14.

So I was thinking it is required to change "If (_OSI("Windows 2013"))" to "If (0)" to fix the issue. How did you fix it by changing to "If (1)"? And your test results did mean "If (1)":

[    0.074047] [ACPI Debug]  "Module level"
[    0.074927] [ACPI Debug]  "In block"
..
[    0.076238] [ACPI Debug]  "In block"

TBH, I've already lost in these comments.

Lekensteyn commented 7 years ago

Is that possible to create ASLTS for this case?

I will try.

As for QEMU, I use it for checking with Linux and that does not need many resources. I think the default RAM size is 64MB (when not specifying the -m option). One page in the webbrowser might already consume more :)

So I was thinking it is required to change "If (_OSI("Windows 2013"))" to "If (0)" to fix the issue. How > did you fix it by changing to "If (1)"?

Changing to If (0) prevents power resources from being added to the namespace and also prevents _OSC from reporting PR3 support, see https://github.com/Bumblebee-Project/bbswitch/issues/142#issuecomment-258675257. It is a workaround that makes the driver happy, but if you look closer in /sys, then you will find that the power resources are missing.

And your test results did mean "If (1)":

[    0.074047] [ACPI Debug]  "Module level"
[    0.074927] [ACPI Debug]  "In block"
..
[    0.076238] [ACPI Debug]  "In block"

Well, it shows that the code is somehow evaluated by ACPI and indeed, if you look at the namespace (acpidbg -b Methods | grep NVP), you will see that the power resources are still added to the namespace. But still there is no power_resource_D0 directory in sysfs.

zetalog commented 7 years ago

As for QEMU, I use it for checking with Linux and that does not need many resources. I think the default RAM size is 64MB (when not specifying the -m option). One page in the webbrowser might already consume more :)

I need qemu to check with Windows behavior, not Linux, we know what Linux is going to do.

  1. Running Win7 on 64MB is very slow, I cannot complete the installation process in time (takes hours).
  2. Increasing memory requires (at least 2GB-4GB is required) increased image size, while I don't have a laptop, after installing the other systems, still can have enough disk space to store the image. They are normallly 32GB/64GB SSD machines, and Windows consumes 20-40G, Linux consumes 10G. And I have development environments to put on it.

I think you haven't replied this: What's the result in the passed case of the following command:

# /tmp/acpidbg -b 'Methods' | grep NVP

Is that same as the failure one?

zetalog commented 7 years ago

I'm not sure if the order matters:

    If (1)
    {
        Scope (\_SB.PCI0.PEG1)
        {
            Name (_PR2, Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2
            })
        }
    }
    PowerResource (NVP2, 0x00, 0x0000)
    {
        ...
    }

I can remember that Windows returns blue screen for a similar case (not Package, but Region, shown below). I'll try the Package again when I'm back. Our conclusion has already been made: there is no forward reference supported in AML grammar. It mustn't work, if accessing an object is allowed at module level, how can it contain forward reference that cannot be solved sequentially at the access time. For example:

    If (1)
    {
        Scope (\_SB.PCI0.PEG1)
        {
            Name (_PR2, Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2
            })
            Store (_PR2, Debug)
        }
    }
    PowerResource (NVP2, 0x00, 0x0000)
    {
        ...
    }

Supporting forward reference seems to obviously break all such correct cases (we firstly need the working Store statement) as this looks like a paradox. I'm sure the rule is that we needn't support any Windows incompliant behaviors.

I think the first of all is: does Windows support this? It entirely doesn't matter if the old Linux supports a wrong behavior while the new Linux doesn't. Could you try this case with Windows?

If Windows doesn't support this, that's the wrong code working for old wrong Linux ACPI implementation. We needn't solve it. It's simply a BIOS bug.

The similar tested Region forward reference case is: https://github.com/acpica/acpica/commit/4af71b11e6 We just fixed the wrong ASLTS case rather than changing the ACPI implementation to make the wrong case working. We have verified by running Windows on qemu. Supporting region forward reference simply breaks region accesses at the module level. As we know SystemIo/SystemMemory can contain BIOS configurations to be accessed during table loading. How can a field of such an operation region be accessed without resolving the attributes of the field? It's obviously a paradox for OperationRegion field case.

The reported case (forward reference in Package) might be different case than region. We still need the Windows behavior confirmation before making any further decision.

Lekensteyn commented 7 years ago

The output of acpidbg was already given, but now I have tested it again (without any If(...), i.e. the "pass" case) and see:

# /tmp/acpidbg -b Methods|grep NVP
                       \NVP3._ON Method       ffff880006075540 02 Args 0 Len 0014 Aml ffffc90000005124
                      \NVP3._OFF Method       ffff880006075578 02 Args 0 Len 0015 Aml ffffc9000000513f
                       \NVP2._ON Method       ffff880006075620 02 Args 0 Len 0014 Aml ffffc9000000516c
                      \NVP2._OFF Method       ffff880006075658 02 Args 0 Len 0015 Aml ffffc90000005187
# ls -d /sys/bus/pci/devices/*/firmware_node/power_resources*
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D0
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D2
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D3hot

I suggested testing Linux with QEMU because the ACPI table can easily be modified without using the Linux ACPI table override method (the same QEMU -acpitable method can be used to test with other OSes like Windows, so you know for sure that both OSes see exactly the same ACPI tables).

I have a Win7 image which has a compressed qcow2 image of 5.1G (virtual size is 12G). It can reproduced with https://github.com/Lekensteyn/windows-bootstrap. However I am testing with Windows 10 (VirtualBox image from https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/, converted to compressed qcow2 is 5.2G). It is not that bad.

I tested the ACPI table from http://www.spinics.net/lists/linux-acpi/msg70069.html again (which has the structure in your last comment), the results are:

ASL Win10 Win 7
no extra SSDT pass pass
If(\_OSI("Windows 2013")) pass pass
no If pass pass

pass means, can boot to login screen without BSOD. Do you have more test cases?

This is the QEMU command I use (create new disk image based on original (win10-ie11.qcow2) and boot it):

qemu-img create -b win10-ie11.qcow2 -f qcow2 w.qcow2 &&
qemu-system-x86_64 -M pc,accel=kvm -net none -m 1G -drive file=w.qcow2 \
    -device pci-bridge,addr=12.0,chassis_nr=2,id=head.21 -acpitable file=pr.aml

For the ASLTS, is this the correct page to start: https://wiki.linaro.org/LEG/Engineering/test-acpi Ah, ASLTS is in the ACPICA tree. Got it.

zetalog commented 7 years ago

The output of acpidbg was already given

Yes, it's given for the "failure" cases. I'm double checking if the "pass" case result is similar.

pass means, can boot to login screen without BSOD. Do you have more test cases?

What we need here is the amli boot log to see what has been executed by Windows.

Lekensteyn commented 7 years ago

Ok, I will obtain an AMLI log, but it may take more time to prepare it. Should have it in an hour though.

zetalog commented 7 years ago

For the ASLTS, is this the correct page to start: https://wiki.linaro.org/LEG/Engineering/test-acpi

I think you can add code in ACPICA upstream at: tests/aslts/src/runtime/collections/functional/ Better to add a new test suite, for example "forwardref" here. Before adding it to ASLTS, making sure that the cases can be supported by the Windows interpreter.

Here is a similar forward reference test case example: tests/aslts/src/runtime/collections/functional/table TLD0.tstg If you can provide the best coverage in the new test suite, please also help to move this case there (possibly tests/aslts/src/runtime/collections/functional/forwardref).

zetalog commented 7 years ago

Ok, I will obtain an AMLI log, but it may take more time to prepare it. Should have it in an hour though.

That helped a lot. We can clearly see what has been executed by the Windows during table loading. See if NVP2 in the package has been evaluated when _PR2 is parsed during table loading.

And can see if Windows loads all of the tables. After booting, please also dump namespace from the amli console and make it a part of the amli log. AFAICR, there should be a command to dump namespace in amli console.

zetalog commented 7 years ago

Is the returning result of the following command identical for fail/pass tests:

# /tmp/acpidbg -b 'evaluate \_SB.PCI0.PEG1._PR0'
# /tmp/acpidbg -b 'evaluate \_SB.PCI0.PEG1._PR2'
# /tmp/acpidbg -b 'evaluate \_SB.PCI0.PEG1._PR3'
Lekensteyn commented 7 years ago

Setting up AMLi locally is a bit more effort than expected, it will take more time.

This is the pass case on Linux (without If):

[    0.129135] [ACPI Debug]  "PR_TEST: _S0W"
[    0.134514] [ACPI Debug]  "PR_TEST: NVP3._ON"
[ACPI Debug]  "PR_TEST: _PS0"
+ ls -d /sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D0 /sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D2 /sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D3hot
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D0
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D2
/sys/bus/pci/devices/0000:00:12.0/firmware_node/power_resources_D3hot
+ acpidbg -b evaluate \_SB.PCI0.S90._PR0
Evaluating \_SB.PCI0.S90._PR0
Evaluation of \_SB.PCI0.S90._PR0 returned object ffff880004c38000, external buffer length 30
 [Package] Contains 1 Elements:
  [Object Reference] = ffff8800060754d0 <Node>          Name NVP3 Power ffff8800061256e8
+ acpidbg -b evaluate \_SB.PCI0.S90._PR2
Evaluating \_SB.PCI0.S90._PR2
Evaluation of \_SB.PCI0.S90._PR2 returned object ffff880005188000, external buffer length 30
 [Package] Contains 1 Elements:
  [Object Reference] = ffff8800060755b0 <Node>          Name NVP2 Power ffff880006125888
+ acpidbg -b evaluate \_SB.PCI0.S90._PR3
Evaluating \_SB.PCI0.S90._PR3
Evaluation of \_SB.PCI0.S90._PR3 returned object ffff880005190000, external buffer length 30
 [Package] Contains 1 Elements:
  [Object Reference] = ffff8800060754d0 <Node>          Name NVP3 Power ffff8800061256e8

This is the fail case (with OSI):

[    0.098297] [ACPI Debug]  "In block"
[    0.135502] [ACPI Debug]  "PR_TEST: _S0W"
+ ls -d /sys/bus/pci/devices/*/firmware_node/power_resources*
ls: /sys/bus/pci/devices/*/firmware_node/power_resources*: No such file or directory
+ acpidbg -b evaluate \_SB.PCI0.S90._PR0
Evaluating \_SB.PCI0.S90._PR0
Evaluation of \_SB.PCI0.S90._PR0 returned object ffff880004d78000, external buffer length 18
 [Package] Contains 0 Elements:

+ acpidbg -b evaluate \_SB.PCI0.S90._PR2
Evaluating \_SB.PCI0.S90._PR2
Evaluation of \_SB.PCI0.S90._PR2 returned object ffff8800053d8000, external buffer length 18
 [Package] Contains 0 Elements:

+ acpidbg -b evaluate \_SB.PCI0.S90._PR3
Evaluating \_SB.PCI0.S90._PR3
Evaluation of \_SB.PCI0.S90._PR3 returned object ffff880005384000, external buffer length 18
 [Package] Contains 0 Elements:

(I have stripped all ACPI Exception: AE_ERROR, While parsing command line (20160831/dbinput-1213) messages for clarity, that message is always shown after executing an acpidbg command. Probably a bug.)

zetalog commented 7 years ago

All seems to be reasonable. For the code below:

    If (1)
    {
        Scope (\_SB.PCI0.PEG1)
        {
            Name (_PR2, Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2
            })
        }
    }
    PowerResource (NVP2, 0x00, 0x0000)
    {
        ...
    }

When acpi_gbl_parse_table_as_term_list = FALSE. The parsing order is:

  1. During table loading, If opcode cannot be handled, and gets linked as MLC.
  2. Then NVP2 is created during table loading.
  3. During MLC execution, _PR2 is created, and NVP2 is found, thus _PR2 contains elements. The deferred MLC execution was just a Linux bug violating the spec. It is designed in the deferred way to avoid the locking issues. Now we have solved the table locking issues, it doesn't have to defer.

When acpi_gbl_parse_table_as_term_list = TRUE. The parsing order is:

  1. During table loading, If opcode can be handled, _PR2 is created, and NVP2 is not found (IMO, Windows will complain an ACPI incompliant BIOS and ends booting with blue screen here), thus _PR2 contains no element (as Linux continues to boot).
  2. Then NVP2 is created during table loading.
  3. There is no deferred MLC execution.

The latter is the preferred behavior as it reflects the correct grammar. But we need proofs.

IMO, for the following case:

    If (1)
    {
        Scope (\_SB.PCI0.PEG1)
        {
            Name (_PR2, Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2 <- Windows blue screen here?
            })
            Store (_PR2, Debug) <- Windows blue screen here?
        }
    }
    PowerResource (NVP2, 0x00, 0x0000)
    {
        ...
    }

Windows should always ends booting with blue screen, either at the packaged NVP2 or at the Store line. If Windows returns blue screen inside of Package, then this report is invalid, Linux needn't do anything. If Windows returns blue screen at the Store line, then we need to improve ACPICA and better to prepare an ASLTS case that has been validated via Windows probing to avoid regressions.

Lekensteyn commented 7 years ago

Ok, it took a while, finally I have an Win10 AMLi log (using the SSDT used for above testing on Linux). A quick analysis shows something interesting, it seems that Windows treats the Package elements as string. Searching for NVP|_PR[0-3]:

During SSDT loading everything seems normal:

ffffe0004e687398: If(\_OSI("Windows 2013")
ffffe00053801562: {
ffffe00053801562: Return(OSI(Arg0)=0xffffffff)
ffffe00053801565: })
ffffe0004e6873ae: {
ffffe0004e6873ae: Store("In block",Debug)String(:Str="In block")
="In block"
ffffe0004e6873bb: Scope(\_SB_.PCI0.S90_)
ffffe0004e6873cd: {
ffffe0004e6873cd: Name(_PR0,
ffffe0004e6873d3: Package(0x1)
ffffe0004e6873d5: {
ffffe0004e6873d5: | NVP3
ffffe0004e6873d9: })
ffffe0004e6873d9: Name(_PR2,
ffffe0004e6873df: Package(0x1)
ffffe0004e6873e1: {
ffffe0004e6873e1: NVP2
ffffe0004e6873e5: })
ffffe0004e6873e5: Name(_PR3,
ffffe0004e6873eb: Package(0x1)
ffffe0004e6873ed: {
ffffe0004e6873ed: NVP3
ffffe0004e6873f1: })
ffffe0004e6873f1: Method(_S0W,0x0)
ffffe0004e687405: Method(_DSW,0x3)
ffffe0004e68740c: Method(_PS0,0x0)
ffffe0004e68741d: Method(_PS3,0x0)
ffffe0004e68742e: }
ffffe0004e68742e: PowerResource(NVP3,0x0,0x0)
ffffe0004e687439: {
ffffe0004e687439: Name(_STA,One)
ffffe0004e68743f: Method(_ON_,0x0)
ffffe0004e68745a: Method(_OFF,0x0)
ffffe0004e687476: }
ffffe0004e687476: PowerResource(NVP2,0x0,0x0)
ffffe0004e687481: {
ffffe0004e687481: Name(_STA,One)
ffffe0004e687487: Method(_ON_,0x0)
ffffe0004e6874a2: Method(_OFF,0x0)
ffffe0004e6874be: }
ffffe0004e6874be: }

A bit later we see:

AMLI: ffffe0004e6d4040: AsyncEvalObject(\NVP3._STA)=0x1
AMLI: ffffe0004e6d4040: AsyncEvalObject(\NVP2._STA)=0x1
AMLI: EvalObject(\NVP3._STA)=Integer(:Value=0x1[1])
AMLI: EvalObject(\NVP2._STA)=Integer(:Value=0x1[1])

Here is gets interesting, it looks like a string (?):

AMLI: FFFFE0004E6D4040: \_SB.PCI0.S90._DSW(0x0,0x0,0x0)
AMLI: ffffe0004e6d4040: AsyncEvalObject(\_SB.PCI0.S90._PR0)
=Package(1){"NVP3"}
AMLI: EvalObject(\_SB.PCI0.S90.S00._ADR)=Integer(:Value=0x0[0])

Yes... it really does think it is a string?

AMLI: EvalObject(\_SB.PCI0.S90._PR0)=
Package(:NumElements=1){
| String(:Str="NVP3")
,}
AMLI: ffffe0004e6d4040: AsyncEvalObject(\_SB.PCI0.S90._PR2)
=Package(1){"NVP2"}

So maybe it is doing a reference by string name, not an immediate reference? The remaining NVP3/PR matches:

AMLI: EvalObject(\_SB.PCI0.S90._PR2)=
Package(:NumElements=1){
| String(:Str="NVP2")
,}
AMLI: ffffe0004e6d4040: AsyncEvalObject(\_SB.PCI0.S90._PR3)
=Package(1){"NVP3"}
...
AMLI: EvalObject(\_SB.PCI0.S90._PR3)=
Package(:NumElements=1){
| String(:Str="NVP3")
,}
...
AMLI: ffffe0004e6d4040: AsyncEvalObject(\NVP3._ON)
...
AMLI: ffffe0004e6d4040: AsyncEvalObject(\NVP2._OFF)
...
AMLI: ffffe0004e6b4040: AsyncEvalObject(\NVP3._ON)

The full log is here: output3-head.txt

Hypothesis: maybe Microsoft stores a string (instead of a reference) if it cannot resolve the name. Test: change the order (define PowerResource before _PR3). Result: no difference, same issue (package elements are still strings). Log: output-pr-first.txt

Testing your new case (Debug = _PR2) gives no issues (I had to add a _PR0 method though or it would BSOD after logging "does not support D0 power state!"):

ffffe00012c87395: Scope(\_SB_.PCI0.S90_)
ffffe00012c873a6: {
ffffe00012c873a6: Name(_PR0,
ffffe00012c873ac: Package(0x1)
ffffe00012c873ae: {
ffffe00012c873ae: | NVP2
ffffe00012c873b2: })
ffffe00012c873b2: Name(_PR2,
ffffe00012c873b8: Package(0x1)
ffffe00012c873ba: {
ffffe00012c873ba: NVP2
ffffe00012c873be: })
ffffe00012c873be: Store(_PR2=Package(1){"NVP2"},Debug)
Package(:NumElements=1){
| String(:Str="NVP2")
,}
=Package(1){"NVP2"}
ffffe00012c873c5: }
ffffe00012c873c5: PowerResource(NVP2,0x0,0x0)
ffffe00012c873d0: {
ffffe00012c873d0: Name(_STA,One)
ffffe00012c873d6: Method(_ON_,0x0)
ffffe00012c873f0: Method(_OFF,0x0)
ffffe00012c8740b: }
ffffe00012c8740b: }

Also tested Debug = NVP2, this resulted in a BSOD with this log:

ffffe000cbe87395: Scope(\_SB_.PCI0.S90_)
ffffe000cbe873a6: {
ffffe000cbe873a6: Name(_PR2,
ffffe000cbe873ac: Package(0x1)
ffffe000cbe873ae: {
ffffe000cbe873ae: | NVP2
ffffe000cbe873b2: })
ffffe000cbe873b2: Store(NVP2
ParseAndGetNameSpaceObject: object NVP2 not found
ffffe000cbe873b7: }

ACPI 6.1, section 19.6.99 says that "Named References to non-Data Objects cannot be resolved to values. They are instead returned in the package as references: ... Power Resource reference"...

Based on a look on the AML grammar (19.2.2 for NameString/NameSeg, 20.2.5.4 for DefPackage) and the hex dump, it seems that we have:

Package (0x01)  // _PR2: Power Resources for D2
            {
                NVP2
            })

  00B6: 12 06 01 4E 56 50 32                             // ...NVP2
 12 - PackageOp
 06 - PkgLength
 01 - NumElements
 "NVP2" - PackageElement / NameString

that could explain why the AMLi log says "string". But it is not really a string... maybe this led to the confusion, and Microsoft implemented "reference" as something that is evaluated later?

zetalog commented 7 years ago

it seems that Windows treats the Package elements as string.

That makes perfect senses. :) NVPx is not forward referenced, there is no referencing performed for NVPx when _PRx is created. Great analysis. Will you prepare ASLTS for us around such cases?

Microsoft implemented "reference" as something that is evaluated later?

IMO, we still can have several assumptions that need to be clarified before making decisions.

  1. Maybe the trick is behind the "NameString", which means a reference is simply also a string..
  2. Or, if a reference can be found, it is a "Reference to the object", otherwise, it will be treated as "String" at that time. I'm curious that if NVP2 is not a forward reference but a backward reference, will MS still treat it as "String"?
  3. And we need to confirm: for a non-predefined package (not _PRx, which is predefined in the spec, it is possible for OSPMs to provide workarounds for such predefined stuffs as they know the definition of the objects), will the interpreter still treat the references as strings? Which means shall we spread this to all Pakcage(s), or even NameString(s), or just to add a workaround to fix _PRx (in ACPICA, we have nspredef module to introduce workarounds to fix only predefined stuffs).

Could you help to figure this out via amli?

Hypothesis: maybe Microsoft stores a string (instead of a reference) if it cannot resolve the name. Test: change the order (define PowerResource before _PR3 ). Result: no difference, same issue (package elements are still strings).

So for the above 3 assumptions, you have assumption 2 answered. Please help to check "assumption 3".

Testing your new case ( Debug = _PR2 ) gives no issues

ffffe00012c873be: Store(_PR2=Package(1){"NVP2"},Debug)
Package(:NumElements=1){
| String(:Str="NVP2")
,}
=Package(1){"NVP2"}

Looks like the NVP2 is still String here. Have you tried "Debug = _PR0" instead as after you changed the case, _PR0 seems to be the 1st _PRx?

Another case is: Will the dumped content be changed to "Reference" after Windows evaluates _PRx? I mean: will _PRx content be changed back to Reference after evaluating _PRx by the Windows power manager? I don't know how to trigger this test. Maybe you can prepare a Method (_PR3, ..) (note, not a package), and put the "Debug = _PR0" inside of it. So that when _PR3 is evaluated by the power manager, _PR0 surely should have been resolved before. Let's see what can be dumped via amli when _PR3 method is evaluated by Windows.

Also tested Debug = NVP2 , this resulted in a BSOD with this log:

ffffe000cbe873b2: Store(NVP2
ParseAndGetNameSpaceObject: object NVP2 not found
ffffe000cbe873b7: }

What does "BSOD" mean here? Will you see a blue screen here or simply a failure message? Is there a way for you to confirm what will be output to the "Debug" object if it's not a blue screen? If this is a blue screen, we can see acpi_gbl_parse_table_as_term_list = FALSE is still buggy. For the old Linux behavior, if the "Debug = NVP2" is in the module level If (1) block, NVP2 can be found. So we really don't want to go back to old craps, we need to improve in the new direction.

ACPI 6.1, section 19.6.99 says that "Named References to non-Data Objects cannot be resolved to values. They are instead returned in the package as references: ... Power Resource reference"...

This looks more like: MS may think a "reference" is an inherited object of a "string". Which seems to be similar to assumption 1.

Thanks Lv

zetalog commented 7 years ago

Noticed this in your report:

 ACPI Exception: AE_ERROR, While parsing command line (20160831/dbinput-1213) 

It's just a design conflict in ACPICA debugger. Originally ACPICA debugger stores input/output into increasing buffers (horrible in-kernel re-allocation against the output whose size may be infinite). While new debugger is a character device based (no buffer allocation is needed), relying on the fact that reading 0 sized input means the end of the stream. So the error is valid for old design but is not valid for the new design and needs to be cleaned up.

Lekensteyn commented 7 years ago

Have you tried "Debug = _PR0" instead as after you changed the case, _PR0 seems to be the 1st _PRx?

No I have not tried, but I do not think that Microsoft is trying to play a smartass with exceptional cases and just implemented whatever seemed the easiest.

It seems unlikely that Microsoft created a special case for _PR3. I have adopted the example from the ACPI spec (16.6.99) to:

DefinitionBlock ("", "SSDT", 1, "OEM", "OEMID", 0x00001000) {
    External (\_SB.CPUS.C000, ProcessorObj)

    Name (INT1, 0x1234)
    PowerResource (PWR1, 0, 0) {
        Name (_STA, 1)
        Method (_ON, 0, NotSerialized) {
            Debug = "_ON"
            Debug = PKG1
        }
        Method (_OFF, 0, NotSerialized) {
            Debug = "_OFF"
            Debug = PKG1
        }
    }

    Name (PKG1, Package () {
        0x3400,
        "Processor",
        \INT1,
        \_SB.CPUS.C000, //\CPU0,
        Package () {
            0x4321,
            \INT1,
            PWR1
        }
    })
    Debug = PKG1
}

All objects (Processor, Power Resource, Integer references) are still treated as string (and printing them later in _ON does not change the debug output):

Package(:NumElements=5){
| Integer(:Value=0x3400[13312])
,| String(:Str="Processor")
,| String(:Str="\INT1")
,| String(:Str="\_SB_.CPUS.C000")
,| Package(:NumElements=3){
| | Integer(:Value=0x4321[17185])
,| | String(:Str="\INT1")
,| | String(:Str="PWR1")
,| }
,}

Will the dumped content be changed to "Reference" after Windows evaluates _PRx? I mean: will _PRx content be changed back to Reference after evaluating _PRx by the Windows power manager?

I doubt it, but let's have a look at another approach to test references.

The comments in the ASL below include results ("expected ..., got (AMLi result)") from the AMLI log (debug-package-index-pr-type-head.txt):

DefinitionBlock ("", "SSDT", 1, "OEM", "OEMID", 0x00001000) {
    External (\_SB.CPUS.C000, ProcessorObj)

    Name (INT1, 0x1234)
    Name (PKG1, Package () {
        0x3400,
        "Processor",
        \INT1,
        \_SB.CPUS.C000, //\CPU0,
        Package () {
            0x4321,
            \INT1,
            PWR1
        },
        PWR1
    })
    Debug = PKG1
    Debug = ObjectType (Index (PKG1, 0)) // 0x3400 - expected Integer (1), got expected
    Debug = ObjectType (Index (PKG1, 2)) // \INT1 - expected Integer (1)?, got String (2)
    Debug = ObjectType (Index (PKG1, 3)) // CPU - expected Processor (12), got String (2)
    Debug = ObjectType (Index (PKG1, 4)) // Package - expected 4 (Package), got expected
    Debug = ObjectType (Index (PKG1, 5)) // PWR1 - expected Power Resource (11), got String (2)

    PowerResource (PWR1, 0, 0) {
        Name (_STA, 1)
        Method (_ON, 0, NotSerialized) {
            Debug = "_ON"
            Debug = PKG1
            Debug = ObjectType (Index (PKG1, 5)) // expected Power Resource (11), got String (2)
        }
        Method (_OFF, 0, NotSerialized) {
            Debug = "_OFF"
            Debug = PKG1
            Debug = ObjectType (Index (PKG1, 5))
        }
    }
    Debug = \_SB.CPUS.C000
    Debug = ObjectType (\_SB.CPUS.C000) // expected Processor (12), got expected
    Debug = PWR1
    Debug = ObjectType (PWR1) // expected Power Resource (11), got expected
}

As you can see, "references" are always visible as String type. I guess they are only evaluated ("dereferenced"?) when the user needs it (for example the user of _PR3 expects a Power Resource). This should also answer your BSOD question, using Debug = NVP2 before defining the power resource results in a BSOD, using it afterwards is totally fine.

zetalog commented 7 years ago

As you can see, "references" are always visible as String type.

OK, I'll use this as the conclusion. So we are trapped by an old issue - Reference implemented by ACPICA isn't compliant to Windows implementation. That really can make things look like forward referernces...

BTW, will you prepare ASLTS for us around such cases? We need well designed and validated test cases in ASLTS to catch regressions.

I guess they are only evaluated ("dereferenced"?) when the user needs it (for example the user of _PR3 expects a Power Resource).

We did have similar assumption before. However, supporting this will change Linux a lot. If ACPICA starts to return String for "NameString", all user side Linux code need to be changed accordingly.

Thanks Lv

Lekensteyn commented 7 years ago

As you can see, "references" are always visible as String type.

BTW, will you prepare ASLTS for us around such cases?

Yes, I will give it a try. Working on it right now, but I will need your help later to see if it is in the correct format.

If ACPICA starts to return String for "NameString", all user side Linux code need to be changed accordingly.

Just NameString for Package elements or for all (MethodInvocation, DefAlias, ...)?

Tested Alias(PWR1, PWR2) and it fails with GetNameSpaceObject: object PWR1 not found during loading of the table. I guess it is really a special thing for Package.

Did another test where I made the _PR2 method return a string (compile with iasl -f because it is illegal grammar):

    Scope (\_SB.PCI0.S90) {
        Name (_PR0, Package (0x01) { PWR1 })
        Name (_PR2, Package (0x01) { "PWR2" })
        Name (_PR3, Package (0x01) { PWR1 })
    }

and it shows that Windows is still able to boot successfully. In the AMLi log (debug-pr-string-head.txt) there is no difference between the named reference and string.

With Name (_PR3, Package (0x01) { INT1 }), we can find that this results in a BSOD after:

AMLI: EvalObject(\_SB.PCI0.S90._PR3)=
Package(:NumElements=1){
| String(:Str="INT1")
,}
FFFFE0011C8C4800 120000 ACPIBuildDevicePowerNodes: INT1 references bad power object.

Another BSOD with complete garbage string (same error occurs with syntatically valid, but non-existing name AAAA):

AMLI: EvalObject(\_SB.PCI0.S90._PR2)=
Package(:NumElements=1){
| String(:Str="HAHA DIS DOES NUT WUKK")
,}
FFFFE00050CC48D0 120000 ACPIBuildDevicePowerNodes: HAHA DIS DOES NUT WUKK Status = c0000034

Another test with two PowerResources having the same name (PWR3), but under different scopes (\_SB.PCI0 and \) and the _PR0 under \_SB.PCI0.S90 shows that \_SB.PCI0.PWR3 is the resulting object (based on the printed paths and Debug strings). This further supports the idea that a string is stored which is then evaluated in the context of the user. Logs: debug-pr-duplicate-head.txt (with _PR3 being PWR3), debug-pr-duplicate-pr0-head.txt (with _PR0 being PWR3)

hhfeuer commented 7 years ago

Are you telling me Lenovo has to set the force switch set to get their acpi tables compiled?

Lekensteyn commented 7 years ago

@hhfeuer No they did not. I needed the force option because I was testing Microsoft's ACPI implementation, but that has nothing to do with the Lenovo BIOS.

mirh commented 7 years ago

Mhh.. I wonder if this might also have something to do with

ACPI Error: Needed [Integer/String/Buffer], found [Reference] ffff8800a106c0d8 (20160831/exresop-425)
ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [OpcodeName unavailable] (20160831/dswexec-461)
ACPI Error: Method parse/execution failed [\_SB.PCI0.VGA.AF03] (Node ffff8800a30a30a0), AE_AML_OPERAND_TYPE (20160831/psparse-543)
ACPI Error: Method parse/execution failed [\_SB.PCI0.VGA.ATIF] (Node ffff8800a30a35a0), AE_AML_OPERAND_TYPE (20160831/psparse-543)

that happens with fglrx driver on my laptop then. EDIT: possibly related problems for nvidia users

Lekensteyn commented 7 years ago

@mirh The error AE_AML_OPERAND_TYPE looks strange, can you file a new bug on https://bugzilla.kernel.org/ for this? This tracker is not really the approprate place to report such bugs. Be sure to include your acpidump and preferable full dmesg.

By the way, I have reported the ACPICA issue described in this report at https://bugs.acpica.org/show_bug.cgi?id=1333.

mirh commented 7 years ago

It's already a miracle 4.9 boots with fglrx tbh 😛

Anyway.. Shouldn't have I better to wait for acpica/acpica#122 to land first?