cubieplayer / Cubian

Debian for Cubieboard
http://cubian.org
271 stars 49 forks source link

"PHY_PageRead : too much ecc err" with Cubian X1 on CB2 -- NAND access is broken #427

Open LazebnyV opened 9 years ago

LazebnyV commented 9 years ago

Hello. I'm having a problem with onboard NAND memory on Cubieboard 2 and stock Cubian X1 distribution. The steps to reproduce are as follows:

  1. Extract image "Cubian-nano+headless-x1-a20.img.7z" to the raw img file and write resulting file to microSD card.
  2. Boot from this microSD card.
  3. Log in with cubie/cubie.
  4. Issue "sudo dd if=/dev/nand of=/dev/null bs=1M count=1" command.

This command tries to read one mebibyte of data from the beginning of NAND block device (and discard it). dd reports quite low speed of 800-900 kilobytes/s. Immediately after there are tons of messages containing "PHY_PageRead : too much ecc err" string in the dmesg. If one tries to write some data to the nand instead, then the data is not fully written which can be checked by write, read and checksum comparison. This is reproducible on 5 different boards, so I think it's a problem with software, rather than physical chip. Also, the previous "Cubian A5" distribution could read internal memory on the same boards with speeds of about 14 MB/s (measured by reading 1 MiB from the nand using the same dd command) and no such errors appeared. This happens with both "A20 Nano" (filename=Cubian-nano-x1-a20-hdmi.img.7z, SHA1=b39752aefde16b883d13a48eeb55fec9029a3520, size=161795222) and "A20 Nano Headless" (filename=Cubian-nano+headless-x1-a20.img.7z, SHA1=42d789781d628f1821f99ea640a22c4b59a5cd6f, size=162457137) prebuilt images of Cubian X1.

Please advise, how can we use internal memory with Cubian distribution? Is there a kernel patch to fix this problem?

danfos commented 9 years ago

Yes, if this was no problem with previous "Cubian A5" distribution and you see this problem now on 5 boards with Cubian X1 it is very likely a NAND driver change causing this.

In my "Cubian A5" log I see:

[    0.000000] Linux version 3.4.75-sun7i ...
....
[NAND] nand driver version: 0x2 0x9
[    2.407488] [NAND] nand driver version: 0x2 0x9

What do you see for Cubian X1?

NB: See http://linux-sunxi.org/Linux_Kernel for kernel development.

michalliu commented 9 years ago

Thanks for the feedback, the nand driver we used is 3.4.79 https://github.com/mmplayer/linux-sunxi/ dev/sunxi-3.4 branch. I will take a look on this problem if i have time

LazebnyV commented 9 years ago

@danfos Cubian X1 also reports "[NAND] nand driver version: 0x2 0x9". There might be some changes in-between which did not lead to version increment.

danfos commented 9 years ago

@michalliu

Would be good if you can have a look into this issue but could you (or someone else) also share the git tags of "Cubian A5" and "Cubian X1". That would enable people to have a look at the differences a maybe help.

LazebnyV commented 9 years ago

By the way, sorry for creating a minor confusion. I looked up the name of the earlier distribution and it was called "Cubian r5" for A20, not "A5". The filename was "Cubian-base-r5-a20.img".

solstag commented 9 years ago

I'm having this issue as well. Issue #376 seems very related to this.

michalliu commented 9 years ago

@danfos maybe this links helps https://github.com/cubieplayer/Cubian/issues/381 Cubian X1 kernel: https://github.com/mmplayer/linux-sunxi/ nandinstall Package: https://github.com/mmplayer/cubian-packages/tree/master/cubian-nandinstall

zimmy73 commented 9 years ago

Hello, I have 7 cubieboard 2 version B; I have:

root@cubie:/var/log# cat /etc/issue Debian GNU/Linux 7 \n \l root@cubie:/var/log# uname -r 3.4.79-sun7i

Just to give you an extra info: on the cubies, beside the os, I have installed smokeping and nothing else.

I'm experiencing the same problem on the 7 units.

The following message is display in the kern.log, syslog and messages files: Jan 14 06:45:08 cubie kernel: [56559.985124] PHY_PageRead : too much ecc err,bank 0 block 1a,page 4 Jan 14 06:55:09 cubie kernel: [57160.301331] PHY_PageRead : too much ecc err,bank 0 block 35a,page e6 Jan 14 06:55:20 cubie kernel: [57160.340312] PHY_PageRead : too much ecc err,bank 0 block 35a,page ef Jan 14 08:10:04 cubie kernel: [61655.204267] PHY_PageRead : too much ecc err,bank 0 block 31b,page df Jan 14 08:10:04 cubie kernel: [61655.280410] PHY_PageRead : too much ecc err,bank 0 block 31b,page ee Jan 14 08:10:04 cubie kernel: [61655.332995] PHY_PageRead : too much ecc err,bank 0 block 31b,page f6 Jan 14 08:40:06 cubie kernel: [63457.850942] PHY_PageRead : too much ecc err,bank 0 block 376,page ee Jan 14 09:05:05 cubie kernel: [64956.817050] PHY_PageRead : too much ecc err,bank 0 block 1a,page e2 Jan 14 10:10:07 cubie kernel: [68858.929723] PHY_PageRead : too much ecc err,bank 0 block 341,page ee

These messages are repeated continuously and, after some days, the unit stuck. The only thing I can do is to reboot it, but after the restart it is not able to boot any more: only red power light on, no output from HDMI ..... I CAN boot the cubieboard from the SD card, but NOT from the nand. To solve this problem I need to reflash the nand with a lubuntu version (using Livesuite) and than reinstall the linux3 img using the SD card and executing the cubian-nandinstall. More or less every 1 month I need to reflash the nand and so on. I need your help to understand what is causing this issue and how can I solve it.

I have also written to cubie support and they told me that the "Nand Install" fail due to the old driver for nand flash. They cannot help me; this is their reply: "This is cubieboard support team Cubian is a third party distrabution built by community enthusiast Please try to get support from the community
The "Nand Install " fail due to the old driver for nand flash, However ,the uboot&nand-flash are allwinner closed source"

Could you help me, please? Many thanks in advance Regards

solstag commented 9 years ago

I've now noticed that any access to the nand on my board running Cubian from the SSD brings up messages in dmesg starting with:

 wrong chip number ,rb_mode = 1, bank = 127, chip = 0, chip info = 1
PHY_PageRead : beyond chip count
zimmy73 commented 9 years ago

Hello all, thanks solstag for you comment. Could you check the kern.log? I've tried the lubuntu version downloaded from the cubietech site: http://dl.cubieboard.org/software/a20-cubieboard/lubuntu/cb-a20-lubuntu-12.10-v1.06/cb2-lubuntu-desktop-20131026/lubuntu-desktop-nand.img.gz, and I have seen that the problem is present also with this img. I'm thinking that the problem is not related with cubian. I've written a new message to cubietech support and I'm still waiting a reply from them.

Any help from you?

Thanks in advance

solstag commented 9 years ago

It seems to be an issue with the kernel Cubian uses, whose drivers do not correctly identify the NAND chips. The message I copied above is from the kernel log (dmesg). This particular lubuntu you've tried is likely using a kernel with the same limitation. Other versions of lubuntu recognize and work with this NAND chip just fine.

zimmy73 commented 9 years ago

Hi Solstag, first of all many thanks for you message. As you know, I have written to cubietech support and this is their reply: "This problem due to the nand flash physical characteristics defect & nand driver bug So ,the nand storage is not safe for writing&reading data frequently situation Cutting power randomly also may lose important data,this a blog for cb's storange http://cubieboard.org/2014/08/12/how-to-choose-the-storage-media-in-cubieboard/ This is a bug for cubieboard ,but nand driver is no open-source for us ,we can do nothing about it ,I am very sorry about it. So ,if you take cubieboards to a writing&reading data frequently situation,I suggetst you repalce the nand storange with tsd flash ,tsd is a safe storange ,Cubietech also can provide a open sdk to flash your firmware to the tsd ,this is a pack sdk for linux : http://dl.cubieboard.org/model/cubieboard2/Source/cubieez/README Cubietech have release tsd version cubiebaords ,sales@cubietech.com can help ."

So the problem seems to be related to a nand bug :-( Keep in mind that I can frequently see the problem using smokeping; but I can reproduce the bug simply acting an "apt-get upgrade" without smokeping installed.

I are writing that, using some lubuntu version, the cubie works fine ... and this is a good news for me ... so if you can tell me which lubuntu version are you using and where I can download it it would be appreciate.

Many thanks

LazebnyV commented 8 years ago

In the end I solved the issue by rebuilding Cubian kernel and modules from source (http://cubian.org/sources/) and replacing Cubian X1 versions with rebuilt ones.

walkin-corpse commented 8 years ago

LazebnyV, I did same thing (rebuit the X1 kernel from the source with some added modules like w1 etc.) and without any success. Nand still hangs when I boot X1 from tf-card and try to mount it.